Class VeniceReducer
java.lang.Object
com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
com.linkedin.venice.hadoop.mapreduce.datawriter.reduce.VeniceReducer
- All Implemented Interfaces:
Closeable
,AutoCloseable
,org.apache.hadoop.io.Closeable
,org.apache.hadoop.mapred.JobConfigurable
,org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,
org.apache.hadoop.io.BytesWritable, org.apache.hadoop.io.BytesWritable, org.apache.hadoop.io.BytesWritable>
- Direct Known Subclasses:
VeniceKafkaInputReducer
public class VeniceReducer
extends AbstractPartitionWriter
implements org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>
VeniceReducer
will be in charge of producing the messages to Kafka broker.
Since VeniceMRPartitioner
is using the same logic of DefaultVenicePartitioner
,
all the messages in the same reducer belongs to the same topic partition.
The reason to introduce a reduce phase is that BDB-JE will benefit with sorted input in the following ways:
1. BDB-JE won't generate so many BINDelta since it won't touch a lot of BINs at a time;
2. The overall BDB-JE insert rate will improve a lot since the disk usage will be reduced a lot (BINDelta will be
much smaller than before);-
Nested Class Summary
Nested classes/interfaces inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
AbstractPartitionWriter.ChildWriterProducerCallback, AbstractPartitionWriter.DuplicateKeyPrinter, AbstractPartitionWriter.PartitionWriterProducerCallback, AbstractPartitionWriter.VeniceWriterMessage
-
Field Summary
Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
TASK_ID_NOT_SET
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
configure
(org.apache.hadoop.mapred.JobConf job) protected void
configureTask
(VeniceProperties props) Allow implementations of this class to configure task-specific stuff.protected AbstractVeniceWriter<byte[],
byte[], byte[]> protected PubSubProducerCallback
protected DataWriterTaskTracker
protected boolean
protected org.apache.hadoop.mapred.JobConf
protected long
Return the size of serialized key and serialized value in bytes across the entire dataset.protected boolean
hasReportedFailure
(DataWriterTaskTracker dataWriterTaskTracker, boolean isDuplicateKeyAllowed) void
reduce
(org.apache.hadoop.io.BytesWritable key, Iterator<org.apache.hadoop.io.BytesWritable> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.BytesWritable, org.apache.hadoop.io.BytesWritable> output, org.apache.hadoop.mapred.Reporter reporter) protected void
setExceedQuota
(boolean exceedQuota) protected void
setHadoopJobClientProvider
(HadoopJobClientProvider hadoopJobClientProvider) protected void
setVeniceWriter
(AbstractVeniceWriter veniceWriter) protected void
Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
close, extract, getDerivedValueSchemaId, initDuplicateKeyPrinter, isEnableWriteCompute, logMessageProgress, processValuesForKey, recordMessageErrored
Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
configure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled
-
Field Details
-
MAP_REDUCE_JOB_ID_PROP
- See Also:
-
-
Constructor Details
-
VeniceReducer
public VeniceReducer()
-
-
Method Details
-
getJobConf
protected org.apache.hadoop.mapred.JobConf getJobConf() -
configure
public void configure(org.apache.hadoop.mapred.JobConf job) - Specified by:
configure
in interfaceorg.apache.hadoop.mapred.JobConfigurable
-
reduce
public void reduce(org.apache.hadoop.io.BytesWritable key, Iterator<org.apache.hadoop.io.BytesWritable> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.BytesWritable, org.apache.hadoop.io.BytesWritable> output, org.apache.hadoop.mapred.Reporter reporter) - Specified by:
reduce
in interfaceorg.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,
org.apache.hadoop.io.BytesWritable, org.apache.hadoop.io.BytesWritable, org.apache.hadoop.io.BytesWritable>
-
getDataWriterTaskTracker
- Overrides:
getDataWriterTaskTracker
in classAbstractPartitionWriter
-
getExceedQuotaFlag
protected boolean getExceedQuotaFlag()- Overrides:
getExceedQuotaFlag
in classAbstractPartitionWriter
-
setVeniceWriter
- Overrides:
setVeniceWriter
in classAbstractPartitionWriter
-
setExceedQuota
protected void setExceedQuota(boolean exceedQuota) - Overrides:
setExceedQuota
in classAbstractPartitionWriter
-
hasReportedFailure
protected boolean hasReportedFailure(DataWriterTaskTracker dataWriterTaskTracker, boolean isDuplicateKeyAllowed) - Overrides:
hasReportedFailure
in classAbstractPartitionWriter
-
getCallback
- Overrides:
getCallback
in classAbstractPartitionWriter
-
configureTask
Description copied from class:AbstractDataWriterTask
Allow implementations of this class to configure task-specific stuff.- Overrides:
configureTask
in classAbstractPartitionWriter
- Parameters:
props
- the job props that the task was configured with.
-
getTotalIncomingDataSizeInBytes
protected long getTotalIncomingDataSizeInBytes()Description copied from class:AbstractPartitionWriter
Return the size of serialized key and serialized value in bytes across the entire dataset. This is an optimization to skip writing the data to Kafka and reduce the load on Kafka and Venice storage nodes. Not all engines can support fetching this information during the execution of the job (eg Spark), but we can live with it for now. The quota is checked again in the Driver after the completion of the DataWriter job, and it will kill the VenicePushJob soon after.- Overrides:
getTotalIncomingDataSizeInBytes
in classAbstractPartitionWriter
- Returns:
- the size of serialized key and serialized value in bytes across the entire dataset
-
setHadoopJobClientProvider
-
createBasicVeniceWriter
- Overrides:
createBasicVeniceWriter
in classAbstractPartitionWriter
-
setVeniceWriterFactory
- Overrides:
setVeniceWriterFactory
in classAbstractPartitionWriter
-