Class KafkaInputFormatCombiner

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, org.apache.hadoop.io.Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,​org.apache.hadoop.io.BytesWritable,​org.apache.hadoop.io.BytesWritable,​org.apache.hadoop.io.BytesWritable>

    public class KafkaInputFormatCombiner
    extends AbstractDataWriterTask
    implements org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,​org.apache.hadoop.io.BytesWritable,​org.apache.hadoop.io.BytesWritable,​org.apache.hadoop.io.BytesWritable>, org.apache.hadoop.mapred.JobConfigurable
    This class is a Combiner, which is a functionality of the MR framework where we can plug a Reducer implementation to be executed within the Mapper task, on its output. This allows the Reducer to have less work to do since part of it was already done in the Mappers. In the case of the KafkaInputFormat, we cannot do all the same work that the VeniceKafkaInputReducer is doing, since that includes producing to Kafka, but the part which is relevant to shift to Mappers is the compaction. We have observed that when the input partitions are very large, then Reducers can run out of memory. By shifting the work to Mappers, we should be able to remove this bottleneck and scale better, especially if used in combination with a low value for: VenicePushJobConstants.KAFKA_INPUT_MAX_RECORDS_PER_MAPPER
    • Constructor Detail

      • KafkaInputFormatCombiner

        public KafkaInputFormatCombiner()
    • Method Detail

      • reduce

        public void reduce​(org.apache.hadoop.io.BytesWritable key,
                           java.util.Iterator<org.apache.hadoop.io.BytesWritable> values,
                           org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.BytesWritable,​org.apache.hadoop.io.BytesWritable> output,
                           org.apache.hadoop.mapred.Reporter reporter)
                    throws java.io.IOException
        Specified by:
        reduce in interface org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,​org.apache.hadoop.io.BytesWritable,​org.apache.hadoop.io.BytesWritable,​org.apache.hadoop.io.BytesWritable>
        Throws:
        java.io.IOException
      • close

        public void close()
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
      • configure

        public void configure​(org.apache.hadoop.mapred.JobConf job)
        Specified by:
        configure in interface org.apache.hadoop.mapred.JobConfigurable