Package com.linkedin.davinci.client
Class InternalDaVinciRecordTransformer<K,V,O>
java.lang.Object
com.linkedin.davinci.client.DaVinciRecordTransformer<K,V,O>
com.linkedin.davinci.client.InternalDaVinciRecordTransformer<K,V,O>
- Type Parameters:
K- type of the input keyV- type of the input valueO- type of the output value
- All Implemented Interfaces:
Closeable,AutoCloseable
@Experimental
public class InternalDaVinciRecordTransformer<K,V,O>
extends DaVinciRecordTransformer<K,V,O>
This is an implementation of
DaVinciRecordTransformer that is used internally inside the
StoreIngestionTask.-
Constructor Summary
ConstructorsConstructorDescriptionInternalDaVinciRecordTransformer(DaVinciRecordTransformer recordTransformer, org.apache.avro.Schema keySchema, org.apache.avro.Schema inputValueSchema, org.apache.avro.Schema outputValueSchema, InternalDaVinciRecordTransformerConfig internalRecordTransformerConfig) -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()voidThis method gets invoked when local RocksDB scan has completed for a partition.longvoidinternalOnRecovery(StorageEngine storageEngine, int partitionId, InternalAvroSpecificSerializer<PartitionState> partitionStateSerializer, Lazy<VeniceCompressor> compressor, PubSubContext pubSubContext, Map<Integer, org.apache.avro.Schema> schemaIdToSchemaMap, ReadOnlySchemaRepository schemaRepository) Using a wrapper around onRecovery because when calculating the class hash it grabs the name of the current class that is invoking it.voidonEndVersionIngestion(int currentVersion) Callback invoked when record consumption stops forDaVinciRecordTransformer.storeVersion.voidonHeartbeat(int partitionId, long heartbeatTimestamp) Lifecycle event triggered when a heartbeat is detected for partitionId.voidonStartVersionIngestion(boolean isCurrentVersion) Callback invoked before consuming records forDaVinciRecordTransformer.storeVersion.voidonVersionSwap(int currentVersion, int futureVersion, int partitionId) Lifecycle event triggered when a version swap is detected for partitionId It is used for DVRT CDC.voidprocessDelete(Lazy<K> key, int partitionId, DaVinciRecordTransformerRecordMetadata recordMetadata) Optional callback for delete events (for example, remove from an external system).voidprocessPut(Lazy<K> key, Lazy<O> value, int partitionId, DaVinciRecordTransformerRecordMetadata recordMetadata) Callback for put/update events (for example, write to an external system).transform(Lazy<K> key, Lazy<V> value, int partitionId, DaVinciRecordTransformerRecordMetadata recordMetadata) Callback to transform records before they are stored.booleanWhether to deserialize input values using a single, uniform schema.Methods inherited from class com.linkedin.davinci.client.DaVinciRecordTransformer
getAlwaysBootstrapFromVersionTopic, getClassHash, getInputValueSchema, getKeySchema, getOutputValueSchema, getRecordTransformerUtility, getStoreName, getStoreRecordsInDaVinci, getStoreVersion, onRecovery, prependSchemaIdToHeader, prependSchemaIdToHeader, transformAndProcessPut
-
Constructor Details
-
InternalDaVinciRecordTransformer
public InternalDaVinciRecordTransformer(DaVinciRecordTransformer recordTransformer, org.apache.avro.Schema keySchema, org.apache.avro.Schema inputValueSchema, org.apache.avro.Schema outputValueSchema, InternalDaVinciRecordTransformerConfig internalRecordTransformerConfig)
-
-
Method Details
-
transform
public DaVinciRecordTransformerResult<O> transform(Lazy<K> key, Lazy<V> value, int partitionId, DaVinciRecordTransformerRecordMetadata recordMetadata) Description copied from class:DaVinciRecordTransformerCallback to transform records before they are stored. This can be useful for tasks such as filtering out unused fields to save storage space. Result types: - SKIP: drop the record - UNCHANGED: keep the original value; useful when only callbacks/notifications are needed - TRANSFORMED: record has been transformed and should be persisted to disk- Specified by:
transformin classDaVinciRecordTransformer<K,V, O> - Parameters:
key- the key of the record to be transformedvalue- the value of the record to be transformedpartitionId- what partition the record came fromrecordMetadata- returnsDaVinciRecordTransformerRecordMetadataif enabled inDaVinciRecordTransformerConfig, null otherwise- Returns:
DaVinciRecordTransformerResult
-
processPut
public void processPut(Lazy<K> key, Lazy<O> value, int partitionId, DaVinciRecordTransformerRecordMetadata recordMetadata) Description copied from class:DaVinciRecordTransformerCallback for put/update events (for example, write to an external system). Invoked afterDaVinciRecordTransformer.transform(Lazy, Lazy, int, DaVinciRecordTransformerRecordMetadata)when the result is UNCHANGED or TRANSFORMED. Also invoked during startup/recovery when Da Vinci replays records persisted on disk to rehydrate external systems.- Specified by:
processPutin classDaVinciRecordTransformer<K,V, O> - Parameters:
key- the key of the record to be putvalue- the value of the record to be put, either the original value or the transformed valuepartitionId- what partition the record came fromrecordMetadata- returnsDaVinciRecordTransformerRecordMetadataif enabled inDaVinciRecordTransformerConfig, null otherwise.
-
processDelete
public void processDelete(Lazy<K> key, int partitionId, DaVinciRecordTransformerRecordMetadata recordMetadata) Description copied from class:DaVinciRecordTransformerOptional callback for delete events (for example, remove from an external system). By default, this is a no-op.- Overrides:
processDeletein classDaVinciRecordTransformer<K,V, O> - Parameters:
key- the key of the record to be deletedpartitionId- what partition the record is being deleted fromrecordMetadata- returnsDaVinciRecordTransformerRecordMetadataif enabled inDaVinciRecordTransformerConfig, null otherwise.
-
onStartVersionIngestion
public void onStartVersionIngestion(boolean isCurrentVersion) Description copied from class:DaVinciRecordTransformerCallback invoked before consuming records forDaVinciRecordTransformer.storeVersion. Use this to open connections, create tables, or initialize resources. By default, it's a no-op.- Overrides:
onStartVersionIngestionin classDaVinciRecordTransformer<K,V, O> - Parameters:
isCurrentVersion- whether the version being started is the current serving version; if not it's a future version
-
onEndVersionIngestion
public void onEndVersionIngestion(int currentVersion) Description copied from class:DaVinciRecordTransformerCallback invoked when record consumption stops forDaVinciRecordTransformer.storeVersion. Use this to close connections, drop tables, or release resources. This can be triggered when thisDaVinciRecordTransformer.storeVersionis no longer the current serving version (for example, after a version swap), or when the application is shutting down. For batch stores, this callback may be invoked even when the application is not shutting down, withDaVinciRecordTransformer.storeVersionequaling thecurrentVersionparameter. This occurs because batch stores do not continuously receive updates after batch ingestion completes unlike hybrid stores, so the consumer resources are closed to free up resources once all data has been consumed. By default, it's a no-op.- Overrides:
onEndVersionIngestionin classDaVinciRecordTransformer<K,V, O> - Parameters:
currentVersion- the current serving version at the time this callback is invoked
-
useUniformInputValueSchema
public boolean useUniformInputValueSchema()Description copied from class:DaVinciRecordTransformerWhether to deserialize input values using a single, uniform schema. If true (recommended only when you need a consistent field set), it uses the latest registered value schema at startup time to deserialize all records. This is useful when projecting into systems that expect a fixed schema (for example, relational tables). If false (default), each record is deserialized with its own writer schema. Note: This flag only applies toGenericRecordvalues. When usingSpecificRecordfor values viaDaVinciRecordTransformerConfig.Builder.setOutputValueClass(Class), values will be deserialized based on the schema passed intoDaVinciRecordTransformerConfig.Builder.setOutputValueSchema(org.apache.avro.Schema).- Overrides:
useUniformInputValueSchemain classDaVinciRecordTransformer<K,V, O>
-
onVersionSwap
public void onVersionSwap(int currentVersion, int futureVersion, int partitionId) Lifecycle event triggered when a version swap is detected for partitionId It is used for DVRT CDC. -
onHeartbeat
public void onHeartbeat(int partitionId, long heartbeatTimestamp) Lifecycle event triggered when a heartbeat is detected for partitionId. It is used for DVRT CDC to record latest heartbeat timestamps per partition. -
internalOnRecovery
public void internalOnRecovery(StorageEngine storageEngine, int partitionId, InternalAvroSpecificSerializer<PartitionState> partitionStateSerializer, Lazy<VeniceCompressor> compressor, PubSubContext pubSubContext, Map<Integer, org.apache.avro.Schema> schemaIdToSchemaMap, ReadOnlySchemaRepository schemaRepository) Using a wrapper around onRecovery because when calculating the class hash it grabs the name of the current class that is invoking it. If we directly invoke onRecovery from this class, the class hash will be calculated based on the contents ofInternalDaVinciRecordTransformer, not the user's implementation of DVRT. We also can't override onRecovery like the other methods because this method is final and the implementation should never be overridden. -
getCountDownStartConsumptionLatchCount
public long getCountDownStartConsumptionLatchCount() -
countDownStartConsumptionLatch
public void countDownStartConsumptionLatch()This method gets invoked when local RocksDB scan has completed for a partition. -
close
- Throws:
IOException
-