Class KMESchemaReaderForKafkaInputFormat
java.lang.Object
com.linkedin.venice.hadoop.input.kafka.KMESchemaReaderForKafkaInputFormat
- All Implemented Interfaces:
SchemaReader
,Closeable
,AutoCloseable
A specialized
SchemaReader
implementation designed for Kafka input format processing
in Hadoop MapReduce jobs. This class provides schema reading capabilities specifically for
Kafka Message Envelope (KME) schemas used during data ingestion from Kafka topics.
This implementation combines newer KME schemas provided at runtime with schemas loaded
from resources to create a comprehensive schema registry for deserializing Kafka messages.
It is primarily used by KafkaInputUtils
to configure Kafka value serializers with
the appropriate schema reader for processing Venice data stored in Kafka topics.
Key features:
- Merges runtime-provided KME schemas with resource-based schemas
- Provides thread-safe access to schema data using
AtomicReference
- Uses default key schema for Venice system operations
- Supports schema evolution by handling multiple schema versions
-
Constructor Summary
ConstructorsConstructorDescriptionKMESchemaReaderForKafkaInputFormat
(Map<Integer, String> newerKmeSchemas) Constructs a new KMESchemaReaderForKafkaInputFormat with the provided newer KME schemas. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Closes the schema reader.org.apache.avro.Schema
Returns the key schema used for Venice operations.Latest update schemas are not supported for KME schema reading in Kafka input format.org.apache.avro.Schema
Returns the latest (highest ID) value schema available.Returns the ID of the latest (highest ID) value schema.org.apache.avro.Schema
getUpdateSchema
(int valueSchemaId) Update schemas are not supported for KME schema reading in Kafka input format.org.apache.avro.Schema
getValueSchema
(int id) Retrieves the value schema for the specified schema ID.int
getValueSchemaId
(org.apache.avro.Schema schema) Finds the schema ID for the given Avro schema.
-
Constructor Details
-
KMESchemaReaderForKafkaInputFormat
Constructs a new KMESchemaReaderForKafkaInputFormat with the provided newer KME schemas.This constructor initializes the schema reader by:
- Loading all existing KME schemas from resources
- Adding the provided newer schemas to the schema data
- Merging resource-based schemas with the newer schemas
- Setting up the default key schema for Venice operations
The newer schemas take precedence over resource-based schemas when there are schema ID conflicts, allowing for schema evolution and updates.
- Parameters:
newerKmeSchemas
- A map of schema ID to schema string containing newer KME schemas that should be available for deserialization. Can be empty but not null.- Throws:
IllegalArgumentException
- if newerKmeSchemas is nullRuntimeException
- if there are issues parsing the default key schema
-
-
Method Details
-
getKeySchema
public org.apache.avro.Schema getKeySchema()Returns the key schema used for Venice operations.- Specified by:
getKeySchema
in interfaceSchemaReader
- Returns:
- The default key schema for Venice system operations
-
getValueSchema
public org.apache.avro.Schema getValueSchema(int id) Retrieves the value schema for the specified schema ID.- Specified by:
getValueSchema
in interfaceSchemaReader
- Parameters:
id
- The schema ID to look up- Returns:
- The Avro schema corresponding to the given ID
- Throws:
IllegalArgumentException
- if the schema ID is not found
-
getValueSchemaId
public int getValueSchemaId(org.apache.avro.Schema schema) Finds the schema ID for the given Avro schema.- Specified by:
getValueSchemaId
in interfaceSchemaReader
- Parameters:
schema
- The Avro schema to find the ID for- Returns:
- The schema ID corresponding to the given schema, or -1 if not found
-
getLatestValueSchema
public org.apache.avro.Schema getLatestValueSchema()Returns the latest (highest ID) value schema available.- Specified by:
getLatestValueSchema
in interfaceSchemaReader
- Returns:
- The most recent value schema
-
getLatestValueSchemaId
Returns the ID of the latest (highest ID) value schema.- Specified by:
getLatestValueSchemaId
in interfaceSchemaReader
- Returns:
- The schema ID of the most recent value schema
-
getUpdateSchema
public org.apache.avro.Schema getUpdateSchema(int valueSchemaId) Update schemas are not supported for KME schema reading in Kafka input format.- Specified by:
getUpdateSchema
in interfaceSchemaReader
- Parameters:
valueSchemaId
- The value schema ID (ignored)- Returns:
- Never returns - always throws UnsupportedOperationException
- Throws:
VeniceUnsupportedOperationException
- Always thrown as update schemas are not supported
-
getLatestUpdateSchema
Latest update schemas are not supported for KME schema reading in Kafka input format.- Specified by:
getLatestUpdateSchema
in interfaceSchemaReader
- Returns:
- Never returns - always throws UnsupportedOperationException
- Throws:
VeniceUnsupportedOperationException
- Always thrown as update schemas are not supported
-
close
public void close()Closes the schema reader. This implementation is a no-op as there are no resources to clean up.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
-