Package com.linkedin.venice.duckdb
Class DuckDBDaVinciRecordTransformer
java.lang.Object
com.linkedin.davinci.client.DaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>
com.linkedin.venice.duckdb.DuckDBDaVinciRecordTransformer
- All Implemented Interfaces:
Closeable
,AutoCloseable
public class DuckDBDaVinciRecordTransformer
extends DaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>
Enables SQL querying of Venice data by integrating DuckDB into DaVinci clients.
This transformer runs inside DaVinci clients and automatically mirrors Venice store data
into a local DuckDB database. This allows you to query your Venice data using standard SQL
instead of the Venice key-value API.
When you configure this transformer in your DaVinci client, it will:
- Create SQL tables that match your Venice store structure
- Keep the tables updated with Venice data changes
- Handle new Venice store versions by managing SQL table versions
- Provide a SQL view that always points to the current data
-
Constructor Summary
ConstructorsConstructorDescriptionDuckDBDaVinciRecordTransformer
(String storeName, int storeVersion, org.apache.avro.Schema keySchema, org.apache.avro.Schema inputValueSchema, org.apache.avro.Schema outputValueSchema, DaVinciRecordTransformerConfig recordTransformerConfig, String baseDir, Set<String> columnsToProject) -
Method Summary
Modifier and TypeMethodDescriptionbuildStoreNameWithVersion
(int version) Creates a versioned table name by combining store name with version number.void
close()
Cleans up database connections and resources.Gets the connection URL for the DuckDB database.void
onEndVersionIngestion
(int currentVersion) Called when DaVinci finishes ingesting all data for the store.void
onStartVersionIngestion
(boolean isCurrentVersion) Called when DaVinci starts ingesting a Venice store version.void
processDelete
(Lazy<org.apache.avro.generic.GenericRecord> key, int partitionId) Deletes a record from DuckDB when Venice receives a delete event.void
processPut
(Lazy<org.apache.avro.generic.GenericRecord> key, Lazy<org.apache.avro.generic.GenericRecord> value, int partitionId) Stores a new/updated record in DuckDB when Venice receives a put event.DaVinciRecordTransformerResult<org.apache.avro.generic.GenericRecord>
transform
(Lazy<org.apache.avro.generic.GenericRecord> key, Lazy<org.apache.avro.generic.GenericRecord> value, int partitionId) Note: This always returns UNCHANGED because we are not modifying the record that is persisted in DaVinci.boolean
Indicates this transformer works with consistent record schemas.Methods inherited from class com.linkedin.davinci.client.DaVinciRecordTransformer
getAlwaysBootstrapFromVersionTopic, getClassHash, getInputValueSchema, getKeySchema, getOutputValueSchema, getRecordTransformerUtility, getStoreName, getStoreRecordsInDaVinci, getStoreVersion, onRecovery, prependSchemaIdToHeader, prependSchemaIdToHeader, transformAndProcessPut
-
Constructor Details
-
DuckDBDaVinciRecordTransformer
public DuckDBDaVinciRecordTransformer(String storeName, int storeVersion, org.apache.avro.Schema keySchema, org.apache.avro.Schema inputValueSchema, org.apache.avro.Schema outputValueSchema, DaVinciRecordTransformerConfig recordTransformerConfig, String baseDir, Set<String> columnsToProject) - Parameters:
baseDir
- directory where DuckDB files will be storedcolumnsToProject
- specific columns to include (leave null/empty for all columns)- Throws:
VeniceException
- if database setup fails
-
-
Method Details
-
transform
public DaVinciRecordTransformerResult<org.apache.avro.generic.GenericRecord> transform(Lazy<org.apache.avro.generic.GenericRecord> key, Lazy<org.apache.avro.generic.GenericRecord> value, int partitionId) Note: This always returns UNCHANGED because we are not modifying the record that is persisted in DaVinci.- Specified by:
transform
in classDaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,
org.apache.avro.generic.GenericRecord, org.apache.avro.generic.GenericRecord> - Parameters:
key
- the key of the record to be transformedvalue
- the value of the record to be transformedpartitionId
- what partition the record came from- Returns:
DaVinciRecordTransformerResult
-
processPut
public void processPut(Lazy<org.apache.avro.generic.GenericRecord> key, Lazy<org.apache.avro.generic.GenericRecord> value, int partitionId) Stores a new/updated record in DuckDB when Venice receives a put event.- Specified by:
processPut
in classDaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,
org.apache.avro.generic.GenericRecord, org.apache.avro.generic.GenericRecord> - Parameters:
key
- the key of the record to be putvalue
- the value of the record to be put, either the original value or the transformed valuepartitionId
- what partition the record came from
-
processDelete
Deletes a record from DuckDB when Venice receives a delete event.- Overrides:
processDelete
in classDaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,
org.apache.avro.generic.GenericRecord, org.apache.avro.generic.GenericRecord> - Parameters:
key
- the key of the record to be deletedpartitionId
- what partition the record is being deleted from
-
onStartVersionIngestion
public void onStartVersionIngestion(boolean isCurrentVersion) Called when DaVinci starts ingesting a Venice store version. Creates the SQL table for this version if it doesn't exist, or verifies the existing table structure is compatible. If this is the current version, it also creates a SQL view pointing to this table.- Overrides:
onStartVersionIngestion
in classDaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,
org.apache.avro.generic.GenericRecord, org.apache.avro.generic.GenericRecord> - Parameters:
isCurrentVersion
- true if this is the active store version- Throws:
VeniceException
- if table creation fails or structure is incompatibleRuntimeException
- if SQL operations fail
-
onEndVersionIngestion
public void onEndVersionIngestion(int currentVersion) Called when DaVinci finishes ingesting all data for the store. Updates the main SQL view to point to the current version's table and removes the table for the previous version that is retired.- Overrides:
onEndVersionIngestion
in classDaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,
org.apache.avro.generic.GenericRecord, org.apache.avro.generic.GenericRecord> - Parameters:
currentVersion
- the version that is now active- Throws:
RuntimeException
- if SQL operations fail
-
useUniformInputValueSchema
public boolean useUniformInputValueSchema()Indicates this transformer works with consistent record schemas.- Overrides:
useUniformInputValueSchema
in classDaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,
org.apache.avro.generic.GenericRecord, org.apache.avro.generic.GenericRecord> - Returns:
- true (requires all records to have the same structure)
-
getDuckDBUrl
Gets the connection URL for the DuckDB database.- Returns:
- DuckDB JDBC connection URL
-
buildStoreNameWithVersion
Creates a versioned table name by combining store name with version number.- Parameters:
version
- the store version number- Returns:
- table name in format "storeName_v
"
-
close
public void close()Cleans up database connections and resources. This is called automatically when the transformer is closed.
-