Class DuckDBDaVinciRecordTransformer

java.lang.Object
com.linkedin.davinci.client.DaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>
com.linkedin.venice.duckdb.DuckDBDaVinciRecordTransformer
All Implemented Interfaces:
Closeable, AutoCloseable

public class DuckDBDaVinciRecordTransformer extends DaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>
Enables SQL querying of Venice data by integrating DuckDB into DaVinci clients. This transformer runs inside DaVinci clients and automatically mirrors Venice store data into a local DuckDB database. This allows you to query your Venice data using standard SQL instead of the Venice key-value API. When you configure this transformer in your DaVinci client, it will: - Create SQL tables that match your Venice store structure - Keep the tables updated with Venice data changes - Handle new Venice store versions by managing SQL table versions - Provide a SQL view that always points to the current data
  • Constructor Details

    • DuckDBDaVinciRecordTransformer

      public DuckDBDaVinciRecordTransformer(String storeName, int storeVersion, org.apache.avro.Schema keySchema, org.apache.avro.Schema inputValueSchema, org.apache.avro.Schema outputValueSchema, DaVinciRecordTransformerConfig recordTransformerConfig, String baseDir, Set<String> columnsToProject)
      Parameters:
      baseDir - directory where DuckDB files will be stored
      columnsToProject - specific columns to include (leave null/empty for all columns)
      Throws:
      VeniceException - if database setup fails
  • Method Details

    • transform

      public DaVinciRecordTransformerResult<org.apache.avro.generic.GenericRecord> transform(Lazy<org.apache.avro.generic.GenericRecord> key, Lazy<org.apache.avro.generic.GenericRecord> value, int partitionId)
      Note: This always returns UNCHANGED because we are not modifying the record that is persisted in DaVinci.
      Specified by:
      transform in class DaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>
      Parameters:
      key - the key of the record to be transformed
      value - the value of the record to be transformed
      partitionId - what partition the record came from
      Returns:
      DaVinciRecordTransformerResult
    • processPut

      public void processPut(Lazy<org.apache.avro.generic.GenericRecord> key, Lazy<org.apache.avro.generic.GenericRecord> value, int partitionId)
      Stores a new/updated record in DuckDB when Venice receives a put event.
      Specified by:
      processPut in class DaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>
      Parameters:
      key - the key of the record to be put
      value - the value of the record to be put, either the original value or the transformed value
      partitionId - what partition the record came from
    • processDelete

      public void processDelete(Lazy<org.apache.avro.generic.GenericRecord> key, int partitionId)
      Deletes a record from DuckDB when Venice receives a delete event.
      Overrides:
      processDelete in class DaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>
      Parameters:
      key - the key of the record to be deleted
      partitionId - what partition the record is being deleted from
    • onStartVersionIngestion

      public void onStartVersionIngestion(boolean isCurrentVersion)
      Called when DaVinci starts ingesting a Venice store version. Creates the SQL table for this version if it doesn't exist, or verifies the existing table structure is compatible. If this is the current version, it also creates a SQL view pointing to this table.
      Overrides:
      onStartVersionIngestion in class DaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>
      Parameters:
      isCurrentVersion - true if this is the active store version
      Throws:
      VeniceException - if table creation fails or structure is incompatible
      RuntimeException - if SQL operations fail
    • onEndVersionIngestion

      public void onEndVersionIngestion(int currentVersion)
      Called when DaVinci finishes ingesting all data for the store. Updates the main SQL view to point to the current version's table and removes the table for the previous version that is retired.
      Overrides:
      onEndVersionIngestion in class DaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>
      Parameters:
      currentVersion - the version that is now active
      Throws:
      RuntimeException - if SQL operations fail
    • useUniformInputValueSchema

      public boolean useUniformInputValueSchema()
      Indicates this transformer works with consistent record schemas.
      Overrides:
      useUniformInputValueSchema in class DaVinciRecordTransformer<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>
      Returns:
      true (requires all records to have the same structure)
    • getDuckDBUrl

      public String getDuckDBUrl()
      Gets the connection URL for the DuckDB database.
      Returns:
      DuckDB JDBC connection URL
    • buildStoreNameWithVersion

      public String buildStoreNameWithVersion(int version)
      Creates a versioned table name by combining store name with version number.
      Parameters:
      version - the store version number
      Returns:
      table name in format "storeName_v"
    • close

      public void close()
      Cleans up database connections and resources. This is called automatically when the transformer is closed.