Class AbstractPartitionWriter
- java.lang.Object
-
- com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
-
- com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
- Direct Known Subclasses:
SparkPartitionWriter
,VeniceReducer
public abstract class AbstractPartitionWriter extends AbstractDataWriterTask implements java.io.Closeable
An abstraction of the task that processes all key/value pairs, checks for duplicates and emits the final key/value pairs to Venice's PubSub.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
AbstractPartitionWriter.DuplicateKeyPrinter
Using Avro Json encoder to print duplicate keys in case there are tons of duplicate keys, only print firstAbstractPartitionWriter.DuplicateKeyPrinter.MAX_NUM_OF_LOG
of them so that it won't pollute Reducer's log.class
AbstractPartitionWriter.PartitionWriterProducerCallback
static class
AbstractPartitionWriter.VeniceWriterMessage
-
Field Summary
-
Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
TASK_ID_NOT_SET
-
-
Constructor Summary
Constructors Constructor Description AbstractPartitionWriter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
protected void
configureTask(VeniceProperties props)
Allow implementations of this class to configure task-specific stuff.protected AbstractPartitionWriter.VeniceWriterMessage
extract(byte[] keyBytes, java.util.Iterator<byte[]> values, DataWriterTaskTracker dataWriterTaskTracker)
protected PubSubProducerCallback
getCallback()
protected DataWriterTaskTracker
getDataWriterTaskTracker()
protected int
getDerivedValueSchemaId()
protected boolean
getExceedQuotaFlag()
protected long
getTotalIncomingDataSizeInBytes()
Return the size of serialized key and serialized value in bytes across the entire dataset.protected boolean
hasReportedFailure(DataWriterTaskTracker dataWriterTaskTracker, boolean isDuplicateKeyAllowed)
protected AbstractPartitionWriter.DuplicateKeyPrinter
initDuplicateKeyPrinter(VeniceProperties props)
protected boolean
isEnableWriteCompute()
protected void
logMessageProgress()
void
processValuesForKey(byte[] key, java.util.Iterator<byte[]> values, DataWriterTaskTracker dataWriterTaskTracker)
protected void
recordMessageErrored(java.lang.Exception e)
protected void
setExceedQuota(boolean exceedQuota)
protected void
setVeniceWriter(AbstractVeniceWriter veniceWriter)
-
Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
configure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled
-
-
-
-
Method Detail
-
processValuesForKey
public void processValuesForKey(byte[] key, java.util.Iterator<byte[]> values, DataWriterTaskTracker dataWriterTaskTracker)
-
getDataWriterTaskTracker
protected DataWriterTaskTracker getDataWriterTaskTracker()
-
getCallback
protected PubSubProducerCallback getCallback()
-
getDerivedValueSchemaId
protected int getDerivedValueSchemaId()
-
isEnableWriteCompute
protected boolean isEnableWriteCompute()
-
extract
protected AbstractPartitionWriter.VeniceWriterMessage extract(byte[] keyBytes, java.util.Iterator<byte[]> values, DataWriterTaskTracker dataWriterTaskTracker)
-
hasReportedFailure
protected boolean hasReportedFailure(DataWriterTaskTracker dataWriterTaskTracker, boolean isDuplicateKeyAllowed)
-
getExceedQuotaFlag
protected boolean getExceedQuotaFlag()
-
close
public void close() throws java.io.IOException
- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Throws:
java.io.IOException
-
initDuplicateKeyPrinter
protected AbstractPartitionWriter.DuplicateKeyPrinter initDuplicateKeyPrinter(VeniceProperties props)
-
configureTask
protected void configureTask(VeniceProperties props)
Description copied from class:AbstractDataWriterTask
Allow implementations of this class to configure task-specific stuff.- Specified by:
configureTask
in classAbstractDataWriterTask
- Parameters:
props
- the job props that the task was configured with.
-
getTotalIncomingDataSizeInBytes
protected long getTotalIncomingDataSizeInBytes()
Return the size of serialized key and serialized value in bytes across the entire dataset. This is an optimization to skip writing the data to Kafka and reduce the load on Kafka and Venice storage nodes. Not all engines can support fetching this information during the execution of the job (eg Spark), but we can live with it for now. The quota is checked again in the Driver after the completion of the DataWriter job, and it will kill the VenicePushJob soon after.- Returns:
- the size of serialized key and serialized value in bytes across the entire dataset
-
recordMessageErrored
protected void recordMessageErrored(java.lang.Exception e)
-
logMessageProgress
protected void logMessageProgress()
-
setVeniceWriter
protected void setVeniceWriter(AbstractVeniceWriter veniceWriter)
-
setExceedQuota
protected void setExceedQuota(boolean exceedQuota)
-
-