Package com.linkedin.venice.hadoop.input.kafka
package com.linkedin.venice.hadoop.input.kafka
-
ClassDescriptionZstd dict trainer for Kafka Repush.We borrowed some idea from the open-sourced attic-crunch lib: https://github.com/apache/attic-crunch/blob/master/crunch-kafka/src/main/java/org/apache/crunch/kafka/record/KafkaInputFormat.java This
InputFormat
implementation is used to read data off a Kafka topic.This class is a Combiner, which is a functionality of the MR framework where we can plug aReducer
implementation to be executed within theMapper
task, on its output.This class is used to support secondary sorting for KafkaInput Repush.This class is used for KafkaInput Repush, and it only considers the key part of the composed key (ignoring the offset).We borrowed some idea from the open-sourced attic-crunch lib: https://github.com/apache/attic-crunch/blob/master/crunch-kafka/src/main/java/org/apache/crunch/kafka/record/KafkaRecordReader.java This class is used to read data off a Kafka topic partition.We borrowed some idea from the open-sourced attic-crunch lib: https://github.com/apache/attic-crunch/blob/master/crunch-kafka/src/main/java/org/apache/crunch/kafka/record/KafkaInputSplit.java InputSplit that represent retrieving data from a singleTopicPartition
between the specified start and end offsets.This class together withKafkaInputKeyComparator
supports secondary sorting of KafkaInput Repush.This class is designed specifically forKafkaInputFormat
, and right now, it is doing simple pass-through.This class is designed specifically forKafkaInputFormat
, and right now, it will pick up the latest entry according to the associated offset, and produce it to Kafka.