com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask

com.linkedin.venice.hadoop.input.kafka.KafkaInputFormatCombiner

All Implemented Interfaces:: Closeable, AutoCloseable, org.apache.hadoop.io.Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>

public class KafkaInputFormatCombiner extends AbstractDataWriterTask implements org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>, org.apache.hadoop.mapred.JobConfigurable

This class is a Combiner, which is a functionality of the MR framework where we can plug a Reducer implementation to be executed within the Mapper task, on its output. This allows the Reducer to have less work to do since part of it was already done in the Mappers. In the case of the KafkaInputFormat, we cannot do all the same work that the VeniceKafkaInputReducer is doing, since that includes producing to Kafka, but the part which is relevant to shift to Mappers is the compaction. We have observed that when the input partitions are very large, then Reducers can run out of memory. By shifting the work to Mappers, we should be able to remove this bottleneck and scale better, especially if used in combination with a low value for: VenicePushJobConstants.KAFKA_INPUT_MAX_RECORDS_PER_MAPPER

Field Summary

Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
TASK_ID_NOT_SET
Constructor Summary

Constructors

Constructor

Description

KafkaInputFormatCombiner()
Method Summary

Modifier and Type

Method

Description

void

close()

void

configure(org.apache.hadoop.mapred.JobConf job)

protected void

configureTask(VeniceProperties props)

Allow implementations of this class to configure task-specific stuff.

void

reduce(org.apache.hadoop.io.BytesWritable key, Iterator<org.apache.hadoop.io.BytesWritable> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable> output, org.apache.hadoop.mapred.Reporter reporter)

Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
configure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- KafkaInputFormatCombiner
  
  public KafkaInputFormatCombiner()
Method Details
- reduce
  
  public void reduce(org.apache.hadoop.io.BytesWritable key, Iterator<org.apache.hadoop.io.BytesWritable> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable> output, org.apache.hadoop.mapred.Reporter reporter) throws IOException
  
  Specified by:
  
  reduce in interface org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>
  
  Throws:
  
  IOException
- close
  
  public void close()
  
  Specified by:
  
  close in interface AutoCloseable
  
  Specified by:
  
  close in interface Closeable
- configureTask
  
  protected void configureTask(VeniceProperties props)
  
  Description copied from class: AbstractDataWriterTask
  
  Allow implementations of this class to configure task-specific stuff.
  
  Specified by:
  
  configureTask in class AbstractDataWriterTask
  
  Parameters:
  
  props - the job props that the task was configured with.
- configure
  
  public void configure(org.apache.hadoop.mapred.JobConf job)
  
  Specified by:
  
  configure in interface org.apache.hadoop.mapred.JobConfigurable

Class KafkaInputFormatCombiner

Field Summary

Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask

Constructor Summary

Method Summary

Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask

Methods inherited from class java.lang.Object

Constructor Details

KafkaInputFormatCombiner

Method Details

reduce

close

configureTask

configure