Package com.linkedin.davinci.client
Class DaVinciRecordTransformer<K,V,O>
java.lang.Object
com.linkedin.davinci.client.DaVinciRecordTransformer<K,V,O>
- Type Parameters:
K
- the type of the input keyV
- the type of the input valueO
- the type of the output value
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
BlockingDaVinciRecordTransformer
,DuckDBDaVinciRecordTransformer
@Experimental
public abstract class DaVinciRecordTransformer<K,V,O>
extends Object
implements Closeable
This abstract class can be extended in order to transform records stored in the Da Vinci Client,
or write to a custom storage of your choice.
The input is what is consumed from the raw Venice data set, whereas the output is what is stored
into Da Vinci's local storage (e.g. RocksDB).
Implementations should be thread-safe and support schema evolution.
Note: Inputs are wrapped inside
Lazy
to avoid deserialization costs if not needed.-
Constructor Summary
ConstructorDescriptionDaVinciRecordTransformer
(int storeVersion, org.apache.avro.Schema keySchema, org.apache.avro.Schema inputValueSchema, org.apache.avro.Schema outputValueSchema, boolean storeRecordsInDaVinci) -
Method Summary
Modifier and TypeMethodDescriptionfinal int
final org.apache.avro.Schema
Returns the schema for the input value used inDaVinciClient
's operations.final org.apache.avro.Schema
Returns the schema for the key used inDaVinciClient
's operations.final org.apache.avro.Schema
Returns the schema for the output value used inDaVinciClient
's operations.final DaVinciRecordTransformerUtility<K,
O> final boolean
final int
void
onEndVersionIngestion
(int currentVersion) Lifecycle event triggered when record consumption is stopped forstoreVersion
.final void
onRecovery
(AbstractStorageEngine storageEngine, Integer partition, Lazy<VeniceCompressor> compressor) Bootstraps the client after it comes online.void
onStartVersionIngestion
(boolean isCurrentVersion) Lifecycle event triggered before consuming records forstoreVersion
.final ByteBuffer
prependSchemaIdToHeader
(ByteBuffer valueBytes, int schemaId) Prepends the given schema ID to the provided ByteBufferfinal ByteBuffer
prependSchemaIdToHeader
(O value, int schemaId, VeniceCompressor compressor) Serializes and compresses the value and prepends the schema ID to the resulting ByteBuffer.void
processDelete
(Lazy<K> key) Override this method to customize the behavior for record deletions.abstract void
processPut
(Lazy<K> key, Lazy<O> value) Implement this method to manage custom state outside the Da Vinci Client.abstract DaVinciRecordTransformerResult<O>
Implement this method to transform records before they are stored.final DaVinciRecordTransformerResult<O>
transformAndProcessPut
(Lazy<K> key, Lazy<V> value) Transforms and processes the given record.boolean
-
Constructor Details
-
DaVinciRecordTransformer
public DaVinciRecordTransformer(int storeVersion, org.apache.avro.Schema keySchema, org.apache.avro.Schema inputValueSchema, org.apache.avro.Schema outputValueSchema, boolean storeRecordsInDaVinci) - Parameters:
storeVersion
- the version of the storekeySchema
- the key schema, which is immutable inside DaVinciClient. Users can modify the key if they are storing records in an external storage engine, but this must be managed by the userinputValueSchema
- the value schema before transformationoutputValueSchema
- the value schema after transformationstoreRecordsInDaVinci
- set this to false if you intend to store records in a custom storage, and not in the Da Vinci Client
-
-
Method Details
-
transform
Implement this method to transform records before they are stored. This can be useful for tasks such as filtering out unused fields to save storage space.- Parameters:
key
- the key of the record to be transformedvalue
- the value of the record to be transformed- Returns:
DaVinciRecordTransformerResult
-
processPut
Implement this method to manage custom state outside the Da Vinci Client.- Parameters:
key
- the key of the record to be putvalue
- the value of the record to be put, derived from the output oftransform(Lazy key, Lazy value)
-
processDelete
Override this method to customize the behavior for record deletions. For example, you can use this method to delete records from a custom storage outside the Da Vinci Client. By default, it performs no operation.- Parameters:
key
- the key of the record to be deleted
-
onStartVersionIngestion
public void onStartVersionIngestion(boolean isCurrentVersion) Lifecycle event triggered before consuming records forstoreVersion
. Use this method to perform setup operations such as opening database connections or creating tables. By default, it performs no operation. -
onEndVersionIngestion
public void onEndVersionIngestion(int currentVersion) Lifecycle event triggered when record consumption is stopped forstoreVersion
. Use this method to perform cleanup operations such as closing database connections or dropping tables. By default, it performs no operation. -
useUniformInputValueSchema
public boolean useUniformInputValueSchema() -
transformAndProcessPut
Transforms and processes the given record.- Parameters:
key
- the key of the record to be putvalue
- the value of the record to be put- Returns:
DaVinciRecordTransformerResult
-
prependSchemaIdToHeader
Serializes and compresses the value and prepends the schema ID to the resulting ByteBuffer.- Parameters:
value
- to be serialized and compressedschemaId
- to prepend to the ByteBuffer- Returns:
- a ByteBuffer containing the schema ID followed by the serialized and compressed value
-
prependSchemaIdToHeader
Prepends the given schema ID to the provided ByteBuffer- Parameters:
valueBytes
- the original serialized and compressed valueschemaId
- to prepend to the ByteBuffer- Returns:
- a ByteBuffer containing the schema ID followed by the serialized and compressed value
-
getStoreVersion
public final int getStoreVersion()- Returns:
storeVersion
-
getClassHash
public final int getClassHash()- Returns:
- the hash of the class bytecode
-
onRecovery
public final void onRecovery(AbstractStorageEngine storageEngine, Integer partition, Lazy<VeniceCompressor> compressor) Bootstraps the client after it comes online. -
getStoreRecordsInDaVinci
public final boolean getStoreRecordsInDaVinci()- Returns:
storeRecordsInDaVinci
-
getKeySchema
public final org.apache.avro.Schema getKeySchema()Returns the schema for the key used inDaVinciClient
's operations.- Returns:
- a
Schema
corresponding to the type ofDaVinciRecordTransformer
.
-
getInputValueSchema
public final org.apache.avro.Schema getInputValueSchema()Returns the schema for the input value used inDaVinciClient
's operations.- Returns:
- a
Schema
corresponding to the type ofDaVinciRecordTransformer
.
-
getOutputValueSchema
public final org.apache.avro.Schema getOutputValueSchema()Returns the schema for the output value used inDaVinciClient
's operations.- Returns:
- a
Schema
corresponding to the type ofDaVinciRecordTransformer
.
-
getRecordTransformerUtility
- Returns:
recordTransformerUtility
-