Class DaVinciRecordTransformer<K,V,O>

java.lang.Object
com.linkedin.davinci.client.DaVinciRecordTransformer<K,V,O>
Type Parameters:
K - input key type from Venice
V - input value type from Venice
O - output value type (post-transformation) that's stored in Da Vinci and forwarded
All Implemented Interfaces:
Closeable, AutoCloseable
Direct Known Subclasses:
BootstrappingVeniceChangelogConsumerDaVinciRecordTransformerImpl.DaVinciRecordTransformerBootstrappingChangelogConsumer, DuckDBDaVinciRecordTransformer, InternalDaVinciRecordTransformer

@Experimental public abstract class DaVinciRecordTransformer<K,V,O> extends Object implements Closeable
Plugin interface for Da Vinci that lets applications register callbacks (puts, deletes, lifecycle) and optionally transform values during ingestion. Use it to mirror updates into external systems while still benefiting from Da Vinci's local cache, or to change what gets stored locally. One transformer instance is created per store version, and lifecycle hooks are invoked per version. During startup, Da Vinci replays records persisted on disk by invoking processPut(Lazy, Lazy, int) so external systems can be rehydrated. Typical setup for most users: keep persisting in Da Vinci (default) and implement callbacks that forward updates to your own storage. Notes: - Implementations must be thread-safe. - Inputs are wrapped in Lazy to avoid deserialization unless used.
  • Constructor Details

    • DaVinciRecordTransformer

      public DaVinciRecordTransformer(String storeName, int storeVersion, org.apache.avro.Schema keySchema, org.apache.avro.Schema inputValueSchema, org.apache.avro.Schema outputValueSchema, DaVinciRecordTransformerConfig recordTransformerConfig)
      Parameters:
      storeName - the name of the Venice store without version info
      storeVersion - the version of the store
      keySchema - the key schema, which is immutable inside DaVinciClient. Users can modify the key if they are storing records in an external storage engine, but this must be managed by the user
      inputValueSchema - the value schema before transformation
      outputValueSchema - the value schema after transformation
      recordTransformerConfig - the config for the record transformer
  • Method Details

    • transform

      public abstract DaVinciRecordTransformerResult<O> transform(Lazy<K> key, Lazy<V> value, int partitionId)
      Callback to transform records before they are stored. This can be useful for tasks such as filtering out unused fields to save storage space. Result types: - SKIP: drop the record - UNCHANGED: keep the original value; useful when only callbacks/notifications are needed - TRANSFORMED: record has been transformed and should be persisted to disk
      Parameters:
      key - the key of the record to be transformed
      value - the value of the record to be transformed
      partitionId - what partition the record came from
      Returns:
      DaVinciRecordTransformerResult
    • processPut

      public abstract void processPut(Lazy<K> key, Lazy<O> value, int partitionId)
      Callback for put/update events (for example, write to an external system). Invoked after transform(Lazy, Lazy, int) when the result is UNCHANGED or TRANSFORMED. Also invoked during startup/recovery when Da Vinci replays records persisted on disk to rehydrate external systems.
      Parameters:
      key - the key of the record to be put
      value - the value of the record to be put, either the original value or the transformed value
      partitionId - what partition the record came from
    • processDelete

      public void processDelete(Lazy<K> key, int partitionId)
      Optional callback for delete events (for example, remove from an external system). By default, this is a no-op.
      Parameters:
      key - the key of the record to be deleted
      partitionId - what partition the record is being deleted from
    • onStartVersionIngestion

      public void onStartVersionIngestion(boolean isCurrentVersion)
      Callback invoked before consuming records for storeVersion. Use this to open connections, create tables, or initialize resources. By default, it's a no-op.
      Parameters:
      isCurrentVersion - whether the version being started is the current serving version; if not it's a future version
    • onEndVersionIngestion

      public void onEndVersionIngestion(int currentVersion)
      Callback invoked when record consumption stops for storeVersion. Use this to close connections, drop tables, or release resources. This can be triggered when this storeVersion is no longer the current serving version (for example, after a version swap), or when the application is shutting down. For batch stores, this callback may be invoked even when the application is not shutting down, with storeVersion equaling the currentVersion parameter. This occurs because batch stores do not continuously receive updates after batch ingestion completes unlike hybrid stores, so the consumer resources are closed to free up resources once all data has been consumed. By default, it's a no-op.
      Parameters:
      currentVersion - the current serving version at the time this callback is invoked
    • useUniformInputValueSchema

      public boolean useUniformInputValueSchema()
      Whether to deserialize input values using a single, uniform schema. If true (recommended only when you need a consistent field set), it uses the latest registered value schema at startup time to deserialize all records. This is useful when projecting into systems that expect a fixed schema (for example, relational tables). If false (default), each record is deserialized with its own writer schema. Note: This flag only applies to GenericRecord values. When using SpecificRecord for values via DaVinciRecordTransformerConfig.Builder.setOutputValueClass(Class), values will be deserialized based on the schema passed into DaVinciRecordTransformerConfig.Builder.setOutputValueSchema(org.apache.avro.Schema).
    • getStoreName

      public final String getStoreName()
      Returns:
      storeName
    • getStoreVersion

      public final int getStoreVersion()
      Returns:
      storeVersion
    • getKeySchema

      public final org.apache.avro.Schema getKeySchema()
      Returns the schema for the key used in DaVinciClient's operations.
      Returns:
      a Schema corresponding to the type of DaVinciRecordTransformer.
    • getInputValueSchema

      public final org.apache.avro.Schema getInputValueSchema()
      Returns the schema for the input value used in DaVinciClient's operations.
      Returns:
      a Schema corresponding to the type of DaVinciRecordTransformer.
    • getOutputValueSchema

      public final org.apache.avro.Schema getOutputValueSchema()
      Returns the schema for the output value used in DaVinciClient's operations.
      Returns:
      a Schema corresponding to the type of DaVinciRecordTransformer.
    • getStoreRecordsInDaVinci

      public final boolean getStoreRecordsInDaVinci()
      Returns:
      storeRecordsInDaVinci
    • getAlwaysBootstrapFromVersionTopic

      public final boolean getAlwaysBootstrapFromVersionTopic()
      Returns:
      alwaysBootstrapFromVersionTopic
    • getRecordTransformerUtility

      public final DaVinciRecordTransformerUtility<K,O> getRecordTransformerUtility()
      Returns:
      recordTransformerUtility
    • prependSchemaIdToHeader

      public final ByteBuffer prependSchemaIdToHeader(O value, int schemaId, VeniceCompressor compressor)
      Helper to serialize and compress the value and prepend the schema ID to the resulting ByteBuffer.
      Parameters:
      value - to be serialized and compressed
      schemaId - to prepend to the ByteBuffer
      Returns:
      a ByteBuffer containing the schema ID followed by the serialized and compressed value
    • prependSchemaIdToHeader

      public final ByteBuffer prependSchemaIdToHeader(ByteBuffer valueBytes, int schemaId)
      Helper to prepend the given schema ID to the provided ByteBuffer.
      Parameters:
      valueBytes - the original serialized and compressed value
      schemaId - to prepend to the ByteBuffer
      Returns:
      a ByteBuffer containing the schema ID followed by the serialized and compressed value
    • transformAndProcessPut

      public final DaVinciRecordTransformerResult<O> transformAndProcessPut(Lazy<K> key, Lazy<V> value, int partitionId)
      Runs transform(Lazy, Lazy, int) and then invokes processPut(Lazy, Lazy, int) when appropriate. Returns null when the record is skipped, unchanged-and-not-stored, or when getStoreRecordsInDaVinci() is false.
      Parameters:
      key - the key of the record to be put
      value - the value of the record to be put
      partitionId - what partition the record came from
      Returns:
      the DaVinciRecordTransformerResult to be stored in Da Vinci, or null if nothing should be stored
    • onRecovery

      public final void onRecovery(StorageEngine storageEngine, int partitionId, InternalAvroSpecificSerializer<PartitionState> partitionStateSerializer, Lazy<VeniceCompressor> compressor, PubSubContext pubSubContext)
      Used during startup/recovery to iterate over records persisted on disk and invoke processPut(Lazy, Lazy, int) to rehydrate external systems.
    • getClassHash

      public final int getClassHash()
      Returns a hash of this class's bytecode. Used to detect implementation changes during bootstrapping and decide whether local data should be considered stale and wiped.