Package com.linkedin.davinci.client
Class InternalDaVinciRecordTransformer<K,V,O>
java.lang.Object
com.linkedin.davinci.client.DaVinciRecordTransformer<K,V,O>
com.linkedin.davinci.client.InternalDaVinciRecordTransformer<K,V,O>
- Type Parameters:
K
- type of the input keyV
- type of the input valueO
- type of the output value
- All Implemented Interfaces:
Closeable
,AutoCloseable
@Experimental
public class InternalDaVinciRecordTransformer<K,V,O>
extends DaVinciRecordTransformer<K,V,O>
This is an implementation of
DaVinciRecordTransformer
that is used internally inside the
StoreIngestionTask
.-
Constructor Summary
ConstructorsConstructorDescriptionInternalDaVinciRecordTransformer
(DaVinciRecordTransformer recordTransformer, org.apache.avro.Schema keySchema, org.apache.avro.Schema inputValueSchema, org.apache.avro.Schema outputValueSchema, InternalDaVinciRecordTransformerConfig internalRecordTransformerConfig) -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
void
This method gets invoked when local RocksDB scan has completed for a partition.long
void
internalOnRecovery
(StorageEngine storageEngine, int partitionId, InternalAvroSpecificSerializer<PartitionState> partitionStateSerializer, Lazy<VeniceCompressor> compressor, PubSubContext pubSubContext, Map<Integer, org.apache.avro.Schema> schemaIdToSchemaMap, ReadOnlySchemaRepository schemaRepository) Using a wrapper around onRecovery because when calculating the class hash it grabs the name of the current class that is invoking it.void
onEndVersionIngestion
(int currentVersion) Callback invoked when record consumption stops forDaVinciRecordTransformer.storeVersion
.void
onHeartbeat
(int partitionId, long heartbeatTimestamp) Lifecycle event triggered when a heartbeat is detected for partitionId.void
onStartVersionIngestion
(boolean isCurrentVersion) Callback invoked before consuming records forDaVinciRecordTransformer.storeVersion
.void
onVersionSwap
(int currentVersion, int futureVersion, int partitionId) Lifecycle event triggered when a version swap is detected for partitionId It is used for DVRT CDC.void
processDelete
(Lazy<K> key, int partitionId, DaVinciRecordTransformerRecordMetadata recordMetadata) Optional callback for delete events (for example, remove from an external system).void
processPut
(Lazy<K> key, Lazy<O> value, int partitionId, DaVinciRecordTransformerRecordMetadata recordMetadata) Callback for put/update events (for example, write to an external system).transform
(Lazy<K> key, Lazy<V> value, int partitionId, DaVinciRecordTransformerRecordMetadata recordMetadata) Callback to transform records before they are stored.boolean
Whether to deserialize input values using a single, uniform schema.Methods inherited from class com.linkedin.davinci.client.DaVinciRecordTransformer
getAlwaysBootstrapFromVersionTopic, getClassHash, getInputValueSchema, getKeySchema, getOutputValueSchema, getRecordTransformerUtility, getStoreName, getStoreRecordsInDaVinci, getStoreVersion, onRecovery, prependSchemaIdToHeader, prependSchemaIdToHeader, transformAndProcessPut
-
Constructor Details
-
InternalDaVinciRecordTransformer
public InternalDaVinciRecordTransformer(DaVinciRecordTransformer recordTransformer, org.apache.avro.Schema keySchema, org.apache.avro.Schema inputValueSchema, org.apache.avro.Schema outputValueSchema, InternalDaVinciRecordTransformerConfig internalRecordTransformerConfig)
-
-
Method Details
-
transform
public DaVinciRecordTransformerResult<O> transform(Lazy<K> key, Lazy<V> value, int partitionId, DaVinciRecordTransformerRecordMetadata recordMetadata) Description copied from class:DaVinciRecordTransformer
Callback to transform records before they are stored. This can be useful for tasks such as filtering out unused fields to save storage space. Result types: - SKIP: drop the record - UNCHANGED: keep the original value; useful when only callbacks/notifications are needed - TRANSFORMED: record has been transformed and should be persisted to disk- Specified by:
transform
in classDaVinciRecordTransformer<K,
V, O> - Parameters:
key
- the key of the record to be transformedvalue
- the value of the record to be transformedpartitionId
- what partition the record came fromrecordMetadata
- returnsDaVinciRecordTransformerRecordMetadata
if enabled inDaVinciRecordTransformerConfig
, null otherwise- Returns:
DaVinciRecordTransformerResult
-
processPut
public void processPut(Lazy<K> key, Lazy<O> value, int partitionId, DaVinciRecordTransformerRecordMetadata recordMetadata) Description copied from class:DaVinciRecordTransformer
Callback for put/update events (for example, write to an external system). Invoked afterDaVinciRecordTransformer.transform(Lazy, Lazy, int, DaVinciRecordTransformerRecordMetadata)
when the result is UNCHANGED or TRANSFORMED. Also invoked during startup/recovery when Da Vinci replays records persisted on disk to rehydrate external systems.- Specified by:
processPut
in classDaVinciRecordTransformer<K,
V, O> - Parameters:
key
- the key of the record to be putvalue
- the value of the record to be put, either the original value or the transformed valuepartitionId
- what partition the record came fromrecordMetadata
- returnsDaVinciRecordTransformerRecordMetadata
if enabled inDaVinciRecordTransformerConfig
, null otherwise.
-
processDelete
public void processDelete(Lazy<K> key, int partitionId, DaVinciRecordTransformerRecordMetadata recordMetadata) Description copied from class:DaVinciRecordTransformer
Optional callback for delete events (for example, remove from an external system). By default, this is a no-op.- Overrides:
processDelete
in classDaVinciRecordTransformer<K,
V, O> - Parameters:
key
- the key of the record to be deletedpartitionId
- what partition the record is being deleted fromrecordMetadata
- returnsDaVinciRecordTransformerRecordMetadata
if enabled inDaVinciRecordTransformerConfig
, null otherwise.
-
onStartVersionIngestion
public void onStartVersionIngestion(boolean isCurrentVersion) Description copied from class:DaVinciRecordTransformer
Callback invoked before consuming records forDaVinciRecordTransformer.storeVersion
. Use this to open connections, create tables, or initialize resources. By default, it's a no-op.- Overrides:
onStartVersionIngestion
in classDaVinciRecordTransformer<K,
V, O> - Parameters:
isCurrentVersion
- whether the version being started is the current serving version; if not it's a future version
-
onEndVersionIngestion
public void onEndVersionIngestion(int currentVersion) Description copied from class:DaVinciRecordTransformer
Callback invoked when record consumption stops forDaVinciRecordTransformer.storeVersion
. Use this to close connections, drop tables, or release resources. This can be triggered when thisDaVinciRecordTransformer.storeVersion
is no longer the current serving version (for example, after a version swap), or when the application is shutting down. For batch stores, this callback may be invoked even when the application is not shutting down, withDaVinciRecordTransformer.storeVersion
equaling thecurrentVersion
parameter. This occurs because batch stores do not continuously receive updates after batch ingestion completes unlike hybrid stores, so the consumer resources are closed to free up resources once all data has been consumed. By default, it's a no-op.- Overrides:
onEndVersionIngestion
in classDaVinciRecordTransformer<K,
V, O> - Parameters:
currentVersion
- the current serving version at the time this callback is invoked
-
useUniformInputValueSchema
public boolean useUniformInputValueSchema()Description copied from class:DaVinciRecordTransformer
Whether to deserialize input values using a single, uniform schema. If true (recommended only when you need a consistent field set), it uses the latest registered value schema at startup time to deserialize all records. This is useful when projecting into systems that expect a fixed schema (for example, relational tables). If false (default), each record is deserialized with its own writer schema. Note: This flag only applies toGenericRecord
values. When usingSpecificRecord
for values viaDaVinciRecordTransformerConfig.Builder.setOutputValueClass(Class)
, values will be deserialized based on the schema passed intoDaVinciRecordTransformerConfig.Builder.setOutputValueSchema(org.apache.avro.Schema)
.- Overrides:
useUniformInputValueSchema
in classDaVinciRecordTransformer<K,
V, O>
-
onVersionSwap
public void onVersionSwap(int currentVersion, int futureVersion, int partitionId) Lifecycle event triggered when a version swap is detected for partitionId It is used for DVRT CDC. -
onHeartbeat
public void onHeartbeat(int partitionId, long heartbeatTimestamp) Lifecycle event triggered when a heartbeat is detected for partitionId. It is used for DVRT CDC to record latest heartbeat timestamps per partition. -
internalOnRecovery
public void internalOnRecovery(StorageEngine storageEngine, int partitionId, InternalAvroSpecificSerializer<PartitionState> partitionStateSerializer, Lazy<VeniceCompressor> compressor, PubSubContext pubSubContext, Map<Integer, org.apache.avro.Schema> schemaIdToSchemaMap, ReadOnlySchemaRepository schemaRepository) Using a wrapper around onRecovery because when calculating the class hash it grabs the name of the current class that is invoking it. If we directly invoke onRecovery from this class, the class hash will be calculated based on the contents ofInternalDaVinciRecordTransformer
, not the user's implementation of DVRT. We also can't override onRecovery like the other methods because this method is final and the implementation should never be overridden. -
getCountDownStartConsumptionLatchCount
public long getCountDownStartConsumptionLatchCount() -
countDownStartConsumptionLatch
public void countDownStartConsumptionLatch()This method gets invoked when local RocksDB scan has completed for a partition. -
close
- Throws:
IOException
-