Class KafkaInputFormatCombiner

java.lang.Object
com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
com.linkedin.venice.hadoop.input.kafka.KafkaInputFormatCombiner
All Implemented Interfaces:
Closeable, AutoCloseable, org.apache.hadoop.io.Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>

public class KafkaInputFormatCombiner extends AbstractDataWriterTask implements org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>, org.apache.hadoop.mapred.JobConfigurable
This class is a Combiner, which is a functionality of the MR framework where we can plug a Reducer implementation to be executed within the Mapper task, on its output. This allows the Reducer to have less work to do since part of it was already done in the Mappers. In the case of the KafkaInputFormat, we cannot do all the same work that the VeniceKafkaInputReducer is doing, since that includes producing to Kafka, but the part which is relevant to shift to Mappers is the compaction. We have observed that when the input partitions are very large, then Reducers can run out of memory. By shifting the work to Mappers, we should be able to remove this bottleneck and scale better, especially if used in combination with a low value for: VenicePushJobConstants.KAFKA_INPUT_MAX_RECORDS_PER_MAPPER
  • Constructor Details

    • KafkaInputFormatCombiner

      public KafkaInputFormatCombiner()
  • Method Details

    • reduce

      public void reduce(org.apache.hadoop.io.BytesWritable key, Iterator<org.apache.hadoop.io.BytesWritable> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable> output, org.apache.hadoop.mapred.Reporter reporter) throws IOException
      Specified by:
      reduce in interface org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>
      Throws:
      IOException
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
    • configureTask

      protected void configureTask(VeniceProperties props)
      Description copied from class: AbstractDataWriterTask
      Allow implementations of this class to configure task-specific stuff.
      Specified by:
      configureTask in class AbstractDataWriterTask
      Parameters:
      props - the job props that the task was configured with.
    • configure

      public void configure(org.apache.hadoop.mapred.JobConf job)
      Specified by:
      configure in interface org.apache.hadoop.mapred.JobConfigurable