com.linkedin.venice.hadoop.mapreduce.datawriter.reduce.VeniceReducer

All Implemented Interfaces:: Closeable, AutoCloseable, org.apache.hadoop.io.Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>

Direct Known Subclasses:: VeniceKafkaInputReducer

public class VeniceReducer extends AbstractPartitionWriter implements org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>

VeniceReducer will be in charge of producing the messages to Kafka broker. Since VeniceMRPartitioner is using the same logic of DefaultVenicePartitioner, all the messages in the same reducer belongs to the same topic partition. The reason to introduce a reduce phase is that BDB-JE will benefit with sorted input in the following ways: 1. BDB-JE won't generate so many BINDelta since it won't touch a lot of BINs at a time; 2. The overall BDB-JE insert rate will improve a lot since the disk usage will be reduced a lot (BINDelta will be much smaller than before);

Nested Class Summary

Nested classes/interfaces inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
AbstractPartitionWriter.ChildWriterProducerCallback, AbstractPartitionWriter.DuplicateKeyPrinter, AbstractPartitionWriter.PartitionWriterProducerCallback, AbstractPartitionWriter.VeniceRecordWithMetadata, AbstractPartitionWriter.VeniceWriterMessage
Field Summary

Fields

Modifier and Type

Field

Description

static final String

MAP_REDUCE_JOB_ID_PROP

Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
TASK_ID_NOT_SET
Constructor Summary

Constructors

Constructor

Description

VeniceReducer()
Method Summary

Modifier and Type

Method

Description

void

configure(org.apache.hadoop.mapred.JobConf job)

protected void

configureTask(VeniceProperties props)

Allow implementations of this class to configure task-specific stuff.

protected AbstractVeniceWriter<byte[],byte[],byte[]>

createBasicVeniceWriter()

protected PubSubProducerCallback

getCallback()

protected DataWriterTaskTracker

getDataWriterTaskTracker()

protected boolean

getExceedQuotaFlag()

protected org.apache.hadoop.mapred.JobConf

getJobConf()

protected long

getTotalIncomingDataSizeInBytes()

Return the size of serialized key and serialized value in bytes across the entire dataset.

protected boolean

hasReportedFailure(DataWriterTaskTracker dataWriterTaskTracker, boolean isDuplicateKeyAllowed)

void

reduce(org.apache.hadoop.io.BytesWritable key, Iterator<org.apache.hadoop.io.BytesWritable> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable> output, org.apache.hadoop.mapred.Reporter reporter)

protected void

setExceedQuota(boolean exceedQuota)

protected void

setHadoopJobClientProvider(HadoopJobClientProvider hadoopJobClientProvider)

protected void

setVeniceWriter(AbstractVeniceWriter veniceWriter)

protected void

setVeniceWriterFactory(VeniceWriterFactory factory)

Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
close, extract, getDerivedValueSchemaId, getRmdSchema, getVeniceWriterFactory, initDuplicateKeyPrinter, isEnableWriteCompute, logMessageProgress, processValuesForKey, recordMessageErrored

Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
configure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface java.io.Closeable
close

Field Details
- MAP_REDUCE_JOB_ID_PROP
  
  public static final String MAP_REDUCE_JOB_ID_PROP
  See Also:
  
  Constant Field Values
Constructor Details
- VeniceReducer
  
  public VeniceReducer()
Method Details
- getJobConf
  
  protected org.apache.hadoop.mapred.JobConf getJobConf()
- configure
  
  public void configure(org.apache.hadoop.mapred.JobConf job)
  
  Specified by:
  
  configure in interface org.apache.hadoop.mapred.JobConfigurable
- reduce
  
  public void reduce(org.apache.hadoop.io.BytesWritable key, Iterator<org.apache.hadoop.io.BytesWritable> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable> output, org.apache.hadoop.mapred.Reporter reporter)
  
  Specified by:
  
  reduce in interface org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>
- getDataWriterTaskTracker
  
  protected DataWriterTaskTracker getDataWriterTaskTracker()
  
  Overrides:
  
  getDataWriterTaskTracker in class AbstractPartitionWriter
- getExceedQuotaFlag
  
  protected boolean getExceedQuotaFlag()
  
  Overrides:
  
  getExceedQuotaFlag in class AbstractPartitionWriter
- setVeniceWriter
  
  protected void setVeniceWriter(AbstractVeniceWriter veniceWriter)
  
  Overrides:
  
  setVeniceWriter in class AbstractPartitionWriter
- setExceedQuota
  
  protected void setExceedQuota(boolean exceedQuota)
  
  Overrides:
  
  setExceedQuota in class AbstractPartitionWriter
- hasReportedFailure
  
  protected boolean hasReportedFailure(DataWriterTaskTracker dataWriterTaskTracker, boolean isDuplicateKeyAllowed)
  
  Overrides:
  
  hasReportedFailure in class AbstractPartitionWriter
- getCallback
  
  protected PubSubProducerCallback getCallback()
  
  Overrides:
  
  getCallback in class AbstractPartitionWriter
- configureTask
  
  protected void configureTask(VeniceProperties props)
  
  Description copied from class: AbstractDataWriterTask
  
  Allow implementations of this class to configure task-specific stuff.
  
  Overrides:
  
  configureTask in class AbstractPartitionWriter
  
  Parameters:
  
  props - the job props that the task was configured with.
- getTotalIncomingDataSizeInBytes
  
  protected long getTotalIncomingDataSizeInBytes()
  
  Description copied from class: AbstractPartitionWriter
  
  Return the size of serialized key and serialized value in bytes across the entire dataset. This is an optimization to skip writing the data to Kafka and reduce the load on Kafka and Venice storage nodes. Not all engines can support fetching this information during the execution of the job (eg Spark), but we can live with it for now. The quota is checked again in the Driver after the completion of the DataWriter job, and it will kill the VenicePushJob soon after.
  
  Overrides:
  
  getTotalIncomingDataSizeInBytes in class AbstractPartitionWriter
  
  Returns:
  
  the size of serialized key and serialized value in bytes across the entire dataset
- setHadoopJobClientProvider
  
  protected void setHadoopJobClientProvider(HadoopJobClientProvider hadoopJobClientProvider)
- createBasicVeniceWriter
  
  protected AbstractVeniceWriter<byte[],byte[],byte[]> createBasicVeniceWriter()
  
  Overrides:
  
  createBasicVeniceWriter in class AbstractPartitionWriter
- setVeniceWriterFactory
  
  protected void setVeniceWriterFactory(VeniceWriterFactory factory)
  
  Overrides:
  
  setVeniceWriterFactory in class AbstractPartitionWriter

Class VeniceReducer

Nested Class Summary

Nested classes/interfaces inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter

Field Summary

Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask

Constructor Summary

Method Summary

Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter

Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask

Methods inherited from class java.lang.Object

Methods inherited from interface java.io.Closeable

Field Details

MAP_REDUCE_JOB_ID_PROP

Constructor Details

VeniceReducer

Method Details

getJobConf

configure

reduce

getDataWriterTaskTracker

getExceedQuotaFlag

setVeniceWriter

setExceedQuota

hasReportedFailure

getCallback

configureTask

getTotalIncomingDataSizeInBytes

setHadoopJobClientProvider

createBasicVeniceWriter

setVeniceWriterFactory