Class SparkChunkAssembler

java.lang.Object
com.linkedin.venice.spark.chunk.SparkChunkAssembler
All Implemented Interfaces:
Serializable

public class SparkChunkAssembler extends Object implements Serializable
Spark adapter for ChunkAssembler that handles chunked values and RMDs. Converts Spark Rows to the format expected by ChunkAssembler, assembles chunks, and returns the result as a Spark Row.
See Also:
  • Constructor Details

    • SparkChunkAssembler

      public SparkChunkAssembler(boolean isRmdChunkingEnabled)
    • SparkChunkAssembler

      public SparkChunkAssembler(boolean isRmdChunkingEnabled, boolean isTTLFilteringEnabled, VeniceProperties filterProperties)
  • Method Details

    • assembleChunks

      public org.apache.spark.sql.Row assembleChunks(byte[] keyBytes, Iterator<org.apache.spark.sql.Row> rows)
      Assemble chunks for a single key. If TTL filtering is enabled, also filters the assembled record.
      Parameters:
      keyBytes - The key bytes
      rows - Iterator of rows for this key (MUST be sorted by offset DESC - highest offset first)
      Returns:
      Assembled row with DEFAULT_SCHEMA_WITH_SCHEMA_ID schema, or null if DELETE, incomplete chunks, or filtered by TTL