Class KafkaInputFormat
java.lang.Object
com.linkedin.venice.hadoop.input.kafka.KafkaInputFormat
- All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<KafkaInputMapperKey,
KafkaInputMapperValue>
public class KafkaInputFormat
extends Object
implements org.apache.hadoop.mapred.InputFormat<KafkaInputMapperKey,KafkaInputMapperValue>
We borrowed some idea from the open-sourced attic-crunch lib:
https://github.com/apache/attic-crunch/blob/master/crunch-kafka/src/main/java/org/apache/crunch/kafka/record/KafkaInputFormat.java
This
InputFormat
implementation is used to read data off a Kafka topic.-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionorg.apache.hadoop.mapred.RecordReader<KafkaInputMapperKey,
KafkaInputMapperValue> getRecordReader
(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter) org.apache.hadoop.mapred.RecordReader<KafkaInputMapperKey,
KafkaInputMapperValue> getRecordReader
(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter, PubSubConsumerAdapter consumer) getSplits
(VeniceProperties veniceProperties) org.apache.hadoop.mapred.InputSplit[]
getSplits
(org.apache.hadoop.mapred.JobConf job, int numSplits) Split the topic according to the topic partition size and the allowed max record per mapper.
-
Constructor Details
-
KafkaInputFormat
public KafkaInputFormat()
-
-
Method Details
-
getSplits
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits) Split the topic according to the topic partition size and the allowed max record per mapper. is not being used in this function.- Specified by:
getSplits
in interfaceorg.apache.hadoop.mapred.InputFormat<KafkaInputMapperKey,
KafkaInputMapperValue>
-
getSplits
-
getRecordReader
public org.apache.hadoop.mapred.RecordReader<KafkaInputMapperKey,KafkaInputMapperValue> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter) - Specified by:
getRecordReader
in interfaceorg.apache.hadoop.mapred.InputFormat<KafkaInputMapperKey,
KafkaInputMapperValue>
-
getRecordReader
public org.apache.hadoop.mapred.RecordReader<KafkaInputMapperKey,KafkaInputMapperValue> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter, PubSubConsumerAdapter consumer)
-