Package com.linkedin.venice.hadoop.input.kafka
package com.linkedin.venice.hadoop.input.kafka
-
ClassesClassDescriptionZstd dict trainer for Kafka Repush.We borrowed some idea from the open-sourced attic-crunch lib: https://github.com/apache/attic-crunch/blob/master/crunch-kafka/src/main/java/org/apache/crunch/kafka/record/KafkaInputFormat.java This
InputFormatimplementation is used to read data off a Kafka topic.This class is a Combiner, which is a functionality of the MR framework where we can plug aReducerimplementation to be executed within theMappertask, on its output.This class is used to support secondary sorting for KafkaInput Repush.This class is used for KafkaInput Repush, and it only considers the key part of the composed key (ignoring the offset).Reads data from a Kafka-backed PubSub topic partition and converts each message intoKafkaInputMapperKey/KafkaInputMapperValue.We borrowed some idea from the open-sourced attic-crunch lib: https://github.com/apache/attic-crunch/blob/master/crunch-kafka/src/main/java/org/apache/crunch/kafka/record/KafkaInputSplit.java InputSplit that represent retrieving data from a singlePubSubTopicPartitionbetween the specified start and end offsets.This class together withKafkaInputKeyComparatorsupports secondary sorting of KafkaInput Repush.A specializedSchemaReaderimplementation designed for Kafka input format processing in Hadoop MapReduce jobs.This class is designed specifically forKafkaInputFormat, and right now, it is doing simple pass-through.This class is designed specifically forKafkaInputFormat, and right now, it will pick up the latest entry according to the associated offset, and produce it to Kafka.