Class KafkaInputFormat

java.lang.Object
com.linkedin.venice.hadoop.input.kafka.KafkaInputFormat
All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<KafkaInputMapperKey,KafkaInputMapperValue>

public class KafkaInputFormat extends Object implements org.apache.hadoop.mapred.InputFormat<KafkaInputMapperKey,KafkaInputMapperValue>
We borrowed some idea from the open-sourced attic-crunch lib: https://github.com/apache/attic-crunch/blob/master/crunch-kafka/src/main/java/org/apache/crunch/kafka/record/KafkaInputFormat.java This InputFormat implementation is used to read data off a Kafka topic.
  • Constructor Details

    • KafkaInputFormat

      public KafkaInputFormat()
  • Method Details

    • getSplits

      public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)
      Split the topic according to the topic partition size and the allowed max record per mapper. is not being used in this function.
      Specified by:
      getSplits in interface org.apache.hadoop.mapred.InputFormat<KafkaInputMapperKey,KafkaInputMapperValue>
    • getSplits

      public KafkaInputSplit[] getSplits(VeniceProperties veniceProperties)
    • getRecordReader

      public org.apache.hadoop.mapred.RecordReader<KafkaInputMapperKey,KafkaInputMapperValue> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)
      Specified by:
      getRecordReader in interface org.apache.hadoop.mapred.InputFormat<KafkaInputMapperKey,KafkaInputMapperValue>
    • getRecordReader

      public org.apache.hadoop.mapred.RecordReader<KafkaInputMapperKey,KafkaInputMapperValue> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter, PubSubConsumerAdapter consumer)