Package com.linkedin.davinci.client
Class DaVinciRecordTransformer<K,V,O>
java.lang.Object
com.linkedin.davinci.client.DaVinciRecordTransformer<K,V,O>
- Type Parameters:
K
- input key type from VeniceV
- input value type from VeniceO
- output value type (post-transformation) that's stored in Da Vinci and forwarded
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
BootstrappingVeniceChangelogConsumerDaVinciRecordTransformerImpl.DaVinciRecordTransformerBootstrappingChangelogConsumer
,DuckDBDaVinciRecordTransformer
,InternalDaVinciRecordTransformer
@Experimental
public abstract class DaVinciRecordTransformer<K,V,O>
extends Object
implements Closeable
Plugin interface for Da Vinci that lets applications register callbacks (puts, deletes, lifecycle) and optionally
transform values during ingestion. Use it to mirror updates into external systems while still benefiting from
Da Vinci's local cache, or to change what gets stored locally.
One transformer instance is created per store version, and lifecycle hooks are invoked per version. During
startup, Da Vinci replays records persisted on disk by invoking
processPut(Lazy, Lazy, int)
so
external systems can be rehydrated.
Typical setup for most users: keep persisting in Da Vinci (default) and implement callbacks that forward updates to
your own storage.
Notes:
- Implementations must be thread-safe.
- Inputs are wrapped in Lazy
to avoid deserialization unless used.-
Constructor Summary
ConstructorsConstructorDescriptionDaVinciRecordTransformer
(String storeName, int storeVersion, org.apache.avro.Schema keySchema, org.apache.avro.Schema inputValueSchema, org.apache.avro.Schema outputValueSchema, DaVinciRecordTransformerConfig recordTransformerConfig) -
Method Summary
Modifier and TypeMethodDescriptionfinal boolean
final int
Returns a hash of this class's bytecode.final org.apache.avro.Schema
Returns the schema for the input value used inDaVinciClient
's operations.final org.apache.avro.Schema
Returns the schema for the key used inDaVinciClient
's operations.final org.apache.avro.Schema
Returns the schema for the output value used inDaVinciClient
's operations.final DaVinciRecordTransformerUtility<K,
O> final String
final boolean
final int
void
onEndVersionIngestion
(int currentVersion) Callback invoked when record consumption stops forstoreVersion
.final void
onRecovery
(StorageEngine storageEngine, int partitionId, InternalAvroSpecificSerializer<PartitionState> partitionStateSerializer, Lazy<VeniceCompressor> compressor, PubSubContext pubSubContext) Used during startup/recovery to iterate over records persisted on disk and invokeprocessPut(Lazy, Lazy, int)
to rehydrate external systems.void
onStartVersionIngestion
(boolean isCurrentVersion) Callback invoked before consuming records forstoreVersion
.final ByteBuffer
prependSchemaIdToHeader
(ByteBuffer valueBytes, int schemaId) Helper to prepend the given schema ID to the provided ByteBuffer.final ByteBuffer
prependSchemaIdToHeader
(O value, int schemaId, VeniceCompressor compressor) Helper to serialize and compress the value and prepend the schema ID to the resulting ByteBuffer.void
processDelete
(Lazy<K> key, int partitionId) Optional callback for delete events (for example, remove from an external system).abstract void
processPut
(Lazy<K> key, Lazy<O> value, int partitionId) Callback for put/update events (for example, write to an external system).abstract DaVinciRecordTransformerResult<O>
Callback to transform records before they are stored.final DaVinciRecordTransformerResult<O>
transformAndProcessPut
(Lazy<K> key, Lazy<V> value, int partitionId) Runstransform(Lazy, Lazy, int)
and then invokesprocessPut(Lazy, Lazy, int)
when appropriate.boolean
Whether to deserialize input values using a single, uniform schema.
-
Constructor Details
-
DaVinciRecordTransformer
public DaVinciRecordTransformer(String storeName, int storeVersion, org.apache.avro.Schema keySchema, org.apache.avro.Schema inputValueSchema, org.apache.avro.Schema outputValueSchema, DaVinciRecordTransformerConfig recordTransformerConfig) - Parameters:
storeName
- the name of the Venice store without version infostoreVersion
- the version of the storekeySchema
- the key schema, which is immutable inside DaVinciClient. Users can modify the key if they are storing records in an external storage engine, but this must be managed by the userinputValueSchema
- the value schema before transformationoutputValueSchema
- the value schema after transformationrecordTransformerConfig
- the config for the record transformer
-
-
Method Details
-
transform
public abstract DaVinciRecordTransformerResult<O> transform(Lazy<K> key, Lazy<V> value, int partitionId) Callback to transform records before they are stored. This can be useful for tasks such as filtering out unused fields to save storage space. Result types: - SKIP: drop the record - UNCHANGED: keep the original value; useful when only callbacks/notifications are needed - TRANSFORMED: record has been transformed and should be persisted to disk- Parameters:
key
- the key of the record to be transformedvalue
- the value of the record to be transformedpartitionId
- what partition the record came from- Returns:
DaVinciRecordTransformerResult
-
processPut
Callback for put/update events (for example, write to an external system). Invoked aftertransform(Lazy, Lazy, int)
when the result is UNCHANGED or TRANSFORMED. Also invoked during startup/recovery when Da Vinci replays records persisted on disk to rehydrate external systems.- Parameters:
key
- the key of the record to be putvalue
- the value of the record to be put, either the original value or the transformed valuepartitionId
- what partition the record came from
-
processDelete
Optional callback for delete events (for example, remove from an external system). By default, this is a no-op.- Parameters:
key
- the key of the record to be deletedpartitionId
- what partition the record is being deleted from
-
onStartVersionIngestion
public void onStartVersionIngestion(boolean isCurrentVersion) Callback invoked before consuming records forstoreVersion
. Use this to open connections, create tables, or initialize resources. By default, it's a no-op.- Parameters:
isCurrentVersion
- whether the version being started is the current serving version; if not it's a future version
-
onEndVersionIngestion
public void onEndVersionIngestion(int currentVersion) Callback invoked when record consumption stops forstoreVersion
. Use this to close connections, drop tables, or release resources. This can be triggered when thisstoreVersion
is no longer the current serving version (for example, after a version swap), or when the application is shutting down. For batch stores, this callback may be invoked even when the application is not shutting down, withstoreVersion
equaling thecurrentVersion
parameter. This occurs because batch stores do not continuously receive updates after batch ingestion completes unlike hybrid stores, so the consumer resources are closed to free up resources once all data has been consumed. By default, it's a no-op.- Parameters:
currentVersion
- the current serving version at the time this callback is invoked
-
useUniformInputValueSchema
public boolean useUniformInputValueSchema()Whether to deserialize input values using a single, uniform schema. If true (recommended only when you need a consistent field set), it uses the latest registered value schema at startup time to deserialize all records. This is useful when projecting into systems that expect a fixed schema (for example, relational tables). If false (default), each record is deserialized with its own writer schema. Note: This flag only applies toGenericRecord
values. When usingSpecificRecord
for values viaDaVinciRecordTransformerConfig.Builder.setOutputValueClass(Class)
, values will be deserialized based on the schema passed intoDaVinciRecordTransformerConfig.Builder.setOutputValueSchema(org.apache.avro.Schema)
. -
getStoreName
- Returns:
storeName
-
getStoreVersion
public final int getStoreVersion()- Returns:
storeVersion
-
getKeySchema
public final org.apache.avro.Schema getKeySchema()Returns the schema for the key used inDaVinciClient
's operations.- Returns:
- a
Schema
corresponding to the type ofDaVinciRecordTransformer
.
-
getInputValueSchema
public final org.apache.avro.Schema getInputValueSchema()Returns the schema for the input value used inDaVinciClient
's operations.- Returns:
- a
Schema
corresponding to the type ofDaVinciRecordTransformer
.
-
getOutputValueSchema
public final org.apache.avro.Schema getOutputValueSchema()Returns the schema for the output value used inDaVinciClient
's operations.- Returns:
- a
Schema
corresponding to the type ofDaVinciRecordTransformer
.
-
getStoreRecordsInDaVinci
public final boolean getStoreRecordsInDaVinci()- Returns:
storeRecordsInDaVinci
-
getAlwaysBootstrapFromVersionTopic
public final boolean getAlwaysBootstrapFromVersionTopic()- Returns:
alwaysBootstrapFromVersionTopic
-
getRecordTransformerUtility
- Returns:
recordTransformerUtility
-
prependSchemaIdToHeader
Helper to serialize and compress the value and prepend the schema ID to the resulting ByteBuffer.- Parameters:
value
- to be serialized and compressedschemaId
- to prepend to the ByteBuffer- Returns:
- a ByteBuffer containing the schema ID followed by the serialized and compressed value
-
prependSchemaIdToHeader
Helper to prepend the given schema ID to the provided ByteBuffer.- Parameters:
valueBytes
- the original serialized and compressed valueschemaId
- to prepend to the ByteBuffer- Returns:
- a ByteBuffer containing the schema ID followed by the serialized and compressed value
-
transformAndProcessPut
public final DaVinciRecordTransformerResult<O> transformAndProcessPut(Lazy<K> key, Lazy<V> value, int partitionId) Runstransform(Lazy, Lazy, int)
and then invokesprocessPut(Lazy, Lazy, int)
when appropriate. Returns null when the record is skipped, unchanged-and-not-stored, or whengetStoreRecordsInDaVinci()
is false.- Parameters:
key
- the key of the record to be putvalue
- the value of the record to be putpartitionId
- what partition the record came from- Returns:
- the
DaVinciRecordTransformerResult
to be stored in Da Vinci, or null if nothing should be stored
-
onRecovery
public final void onRecovery(StorageEngine storageEngine, int partitionId, InternalAvroSpecificSerializer<PartitionState> partitionStateSerializer, Lazy<VeniceCompressor> compressor, PubSubContext pubSubContext) Used during startup/recovery to iterate over records persisted on disk and invokeprocessPut(Lazy, Lazy, int)
to rehydrate external systems. -
getClassHash
public final int getClassHash()Returns a hash of this class's bytecode. Used to detect implementation changes during bootstrapping and decide whether local data should be considered stale and wiped.
-