Class KMESchemaReaderForKafkaInputFormat

java.lang.Object
com.linkedin.venice.hadoop.input.kafka.KMESchemaReaderForKafkaInputFormat
All Implemented Interfaces:
SchemaReader, Closeable, AutoCloseable

public class KMESchemaReaderForKafkaInputFormat extends Object implements SchemaReader
A specialized SchemaReader implementation designed for Kafka input format processing in Hadoop MapReduce jobs. This class provides schema reading capabilities specifically for Kafka Message Envelope (KME) schemas used during data ingestion from Kafka topics.

This implementation combines newer KME schemas provided at runtime with schemas loaded from resources to create a comprehensive schema registry for deserializing Kafka messages. It is primarily used by KafkaInputUtils to configure Kafka value serializers with the appropriate schema reader for processing Venice data stored in Kafka topics.

Key features:

  • Merges runtime-provided KME schemas with resource-based schemas
  • Provides thread-safe access to schema data using AtomicReference
  • Uses default key schema for Venice system operations
  • Supports schema evolution by handling multiple schema versions
  • Constructor Details

    • KMESchemaReaderForKafkaInputFormat

      public KMESchemaReaderForKafkaInputFormat(Map<Integer,String> newerKmeSchemas)
      Constructs a new KMESchemaReaderForKafkaInputFormat with the provided newer KME schemas.

      This constructor initializes the schema reader by:

      1. Loading all existing KME schemas from resources
      2. Adding the provided newer schemas to the schema data
      3. Merging resource-based schemas with the newer schemas
      4. Setting up the default key schema for Venice operations

      The newer schemas take precedence over resource-based schemas when there are schema ID conflicts, allowing for schema evolution and updates.

      Parameters:
      newerKmeSchemas - A map of schema ID to schema string containing newer KME schemas that should be available for deserialization. Can be empty but not null.
      Throws:
      IllegalArgumentException - if newerKmeSchemas is null
      RuntimeException - if there are issues parsing the default key schema
  • Method Details

    • getKeySchema

      public org.apache.avro.Schema getKeySchema()
      Returns the key schema used for Venice operations.
      Specified by:
      getKeySchema in interface SchemaReader
      Returns:
      The default key schema for Venice system operations
    • getValueSchema

      public org.apache.avro.Schema getValueSchema(int id)
      Retrieves the value schema for the specified schema ID.
      Specified by:
      getValueSchema in interface SchemaReader
      Parameters:
      id - The schema ID to look up
      Returns:
      The Avro schema corresponding to the given ID
      Throws:
      IllegalArgumentException - if the schema ID is not found
    • getValueSchemaId

      public int getValueSchemaId(org.apache.avro.Schema schema)
      Finds the schema ID for the given Avro schema.
      Specified by:
      getValueSchemaId in interface SchemaReader
      Parameters:
      schema - The Avro schema to find the ID for
      Returns:
      The schema ID corresponding to the given schema, or -1 if not found
    • getLatestValueSchema

      public org.apache.avro.Schema getLatestValueSchema()
      Returns the latest (highest ID) value schema available.
      Specified by:
      getLatestValueSchema in interface SchemaReader
      Returns:
      The most recent value schema
    • getLatestValueSchemaId

      public Integer getLatestValueSchemaId()
      Returns the ID of the latest (highest ID) value schema.
      Specified by:
      getLatestValueSchemaId in interface SchemaReader
      Returns:
      The schema ID of the most recent value schema
    • getUpdateSchema

      public org.apache.avro.Schema getUpdateSchema(int valueSchemaId)
      Update schemas are not supported for KME schema reading in Kafka input format.
      Specified by:
      getUpdateSchema in interface SchemaReader
      Parameters:
      valueSchemaId - The value schema ID (ignored)
      Returns:
      Never returns - always throws UnsupportedOperationException
      Throws:
      VeniceUnsupportedOperationException - Always thrown as update schemas are not supported
    • getLatestUpdateSchema

      public DerivedSchemaEntry getLatestUpdateSchema()
      Latest update schemas are not supported for KME schema reading in Kafka input format.
      Specified by:
      getLatestUpdateSchema in interface SchemaReader
      Returns:
      Never returns - always throws UnsupportedOperationException
      Throws:
      VeniceUnsupportedOperationException - Always thrown as update schemas are not supported
    • close

      public void close()
      Closes the schema reader. This implementation is a no-op as there are no resources to clean up.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable