Class VeniceReducer
java.lang.Object
com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
com.linkedin.venice.hadoop.mapreduce.datawriter.reduce.VeniceReducer
- All Implemented Interfaces:
- Closeable,- AutoCloseable,- org.apache.hadoop.io.Closeable,- org.apache.hadoop.mapred.JobConfigurable,- org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,- org.apache.hadoop.io.BytesWritable, - org.apache.hadoop.io.BytesWritable, - org.apache.hadoop.io.BytesWritable> 
- Direct Known Subclasses:
- VeniceKafkaInputReducer
public class VeniceReducer
extends AbstractPartitionWriter
implements org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>   
VeniceReducer will be in charge of producing the messages to Kafka broker.
 Since VeniceMRPartitioner is using the same logic of DefaultVenicePartitioner,
 all the messages in the same reducer belongs to the same topic partition.
 The reason to introduce a reduce phase is that BDB-JE will benefit with sorted input in the following ways:
 1. BDB-JE won't generate so many BINDelta since it won't touch a lot of BINs at a time;
 2. The overall BDB-JE insert rate will improve a lot since the disk usage will be reduced a lot (BINDelta will be
 much smaller than before);- 
Nested Class SummaryNested classes/interfaces inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriterAbstractPartitionWriter.ChildWriterProducerCallback, AbstractPartitionWriter.DuplicateKeyPrinter, AbstractPartitionWriter.PartitionWriterProducerCallback, AbstractPartitionWriter.VeniceRecordWithMetadata, AbstractPartitionWriter.VeniceWriterMessage
- 
Field SummaryFieldsFields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTaskTASK_ID_NOT_SET
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionvoidconfigure(org.apache.hadoop.mapred.JobConf job) protected voidconfigureTask(VeniceProperties props) Allow implementations of this class to configure task-specific stuff.protected AbstractVeniceWriter<byte[],byte[], byte[]> protected PubSubProducerCallbackprotected DataWriterTaskTrackerprotected booleanprotected org.apache.hadoop.mapred.JobConfprotected longReturn the size of serialized key and serialized value in bytes across the entire dataset.protected booleanhasReportedFailure(DataWriterTaskTracker dataWriterTaskTracker, boolean isDuplicateKeyAllowed) voidreduce(org.apache.hadoop.io.BytesWritable key, Iterator<org.apache.hadoop.io.BytesWritable> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.BytesWritable, org.apache.hadoop.io.BytesWritable> output, org.apache.hadoop.mapred.Reporter reporter) protected voidsetExceedQuota(boolean exceedQuota) protected voidsetHadoopJobClientProvider(HadoopJobClientProvider hadoopJobClientProvider) protected voidsetVeniceWriter(AbstractVeniceWriter veniceWriter) protected voidMethods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriterclose, extract, getDerivedValueSchemaId, getRmdSchema, getVeniceWriterFactory, initDuplicateKeyPrinter, isEnableWriteCompute, logMessageProgress, processValuesForKey, recordMessageErroredMethods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTaskconfigure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled
- 
Field Details- 
MAP_REDUCE_JOB_ID_PROP- See Also:
 
 
- 
- 
Constructor Details- 
VeniceReducerpublic VeniceReducer()
 
- 
- 
Method Details- 
getJobConfprotected org.apache.hadoop.mapred.JobConf getJobConf()
- 
configurepublic void configure(org.apache.hadoop.mapred.JobConf job) - Specified by:
- configurein interface- org.apache.hadoop.mapred.JobConfigurable
 
- 
reducepublic void reduce(org.apache.hadoop.io.BytesWritable key, Iterator<org.apache.hadoop.io.BytesWritable> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.BytesWritable, org.apache.hadoop.io.BytesWritable> output, org.apache.hadoop.mapred.Reporter reporter) - Specified by:
- reducein interface- org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,- org.apache.hadoop.io.BytesWritable, - org.apache.hadoop.io.BytesWritable, - org.apache.hadoop.io.BytesWritable> 
 
- 
getDataWriterTaskTracker- Overrides:
- getDataWriterTaskTrackerin class- AbstractPartitionWriter
 
- 
getExceedQuotaFlagprotected boolean getExceedQuotaFlag()- Overrides:
- getExceedQuotaFlagin class- AbstractPartitionWriter
 
- 
setVeniceWriter- Overrides:
- setVeniceWriterin class- AbstractPartitionWriter
 
- 
setExceedQuotaprotected void setExceedQuota(boolean exceedQuota) - Overrides:
- setExceedQuotain class- AbstractPartitionWriter
 
- 
hasReportedFailureprotected boolean hasReportedFailure(DataWriterTaskTracker dataWriterTaskTracker, boolean isDuplicateKeyAllowed) - Overrides:
- hasReportedFailurein class- AbstractPartitionWriter
 
- 
getCallback- Overrides:
- getCallbackin class- AbstractPartitionWriter
 
- 
configureTaskDescription copied from class:AbstractDataWriterTaskAllow implementations of this class to configure task-specific stuff.- Overrides:
- configureTaskin class- AbstractPartitionWriter
- Parameters:
- props- the job props that the task was configured with.
 
- 
getTotalIncomingDataSizeInBytesprotected long getTotalIncomingDataSizeInBytes()Description copied from class:AbstractPartitionWriterReturn the size of serialized key and serialized value in bytes across the entire dataset. This is an optimization to skip writing the data to Kafka and reduce the load on Kafka and Venice storage nodes. Not all engines can support fetching this information during the execution of the job (eg Spark), but we can live with it for now. The quota is checked again in the Driver after the completion of the DataWriter job, and it will kill the VenicePushJob soon after.- Overrides:
- getTotalIncomingDataSizeInBytesin class- AbstractPartitionWriter
- Returns:
- the size of serialized key and serialized value in bytes across the entire dataset
 
- 
setHadoopJobClientProvider
- 
createBasicVeniceWriter- Overrides:
- createBasicVeniceWriterin class- AbstractPartitionWriter
 
- 
setVeniceWriterFactory- Overrides:
- setVeniceWriterFactoryin class- AbstractPartitionWriter
 
 
-