Class KafkaInputFormatCombiner
- java.lang.Object
-
- com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
-
- com.linkedin.venice.hadoop.input.kafka.KafkaInputFormatCombiner
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,org.apache.hadoop.io.Closeable
,org.apache.hadoop.mapred.JobConfigurable
,org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>
public class KafkaInputFormatCombiner extends AbstractDataWriterTask implements org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>, org.apache.hadoop.mapred.JobConfigurable
This class is a Combiner, which is a functionality of the MR framework where we can plug aReducer
implementation to be executed within theMapper
task, on its output. This allows the Reducer to have less work to do since part of it was already done in the Mappers. In the case of theKafkaInputFormat
, we cannot do all the same work that theVeniceKafkaInputReducer
is doing, since that includes producing to Kafka, but the part which is relevant to shift to Mappers is the compaction. We have observed that when the input partitions are very large, then Reducers can run out of memory. By shifting the work to Mappers, we should be able to remove this bottleneck and scale better, especially if used in combination with a low value for:VenicePushJobConstants.KAFKA_INPUT_MAX_RECORDS_PER_MAPPER
-
-
Field Summary
-
Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
TASK_ID_NOT_SET
-
-
Constructor Summary
Constructors Constructor Description KafkaInputFormatCombiner()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
void
configure(org.apache.hadoop.mapred.JobConf job)
protected void
configureTask(VeniceProperties props)
Allow implementations of this class to configure task-specific stuff.void
reduce(org.apache.hadoop.io.BytesWritable key, java.util.Iterator<org.apache.hadoop.io.BytesWritable> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable> output, org.apache.hadoop.mapred.Reporter reporter)
-
Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
configure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled
-
-
-
-
Method Detail
-
reduce
public void reduce(org.apache.hadoop.io.BytesWritable key, java.util.Iterator<org.apache.hadoop.io.BytesWritable> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable> output, org.apache.hadoop.mapred.Reporter reporter) throws java.io.IOException
- Specified by:
reduce
in interfaceorg.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>
- Throws:
java.io.IOException
-
close
public void close()
- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
-
configureTask
protected void configureTask(VeniceProperties props)
Description copied from class:AbstractDataWriterTask
Allow implementations of this class to configure task-specific stuff.- Specified by:
configureTask
in classAbstractDataWriterTask
- Parameters:
props
- the job props that the task was configured with.
-
configure
public void configure(org.apache.hadoop.mapred.JobConf job)
- Specified by:
configure
in interfaceorg.apache.hadoop.mapred.JobConfigurable
-
-