Class AbstractPartitionWriter
java.lang.Object
com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
SparkPartitionWriter
,VeniceReducer
An abstraction of the task that processes all key/value pairs, checks for duplicates and emits the final key/value
pairs to Venice's PubSub.
-
Nested Class Summary
Modifier and TypeClassDescriptionclass
static class
Using Avro Json encoder to print duplicate keys in case there are tons of duplicate keys, only print firstAbstractPartitionWriter.DuplicateKeyPrinter.MAX_NUM_OF_LOG
of them so that it won't pollute Reducer's log.class
static class
-
Field Summary
Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
TASK_ID_NOT_SET
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
protected void
configureTask
(VeniceProperties props) Allow implementations of this class to configure task-specific stuff.protected AbstractVeniceWriter<byte[],
byte[], byte[]> extract
(byte[] keyBytes, Iterator<byte[]> values, DataWriterTaskTracker dataWriterTaskTracker) protected PubSubProducerCallback
protected DataWriterTaskTracker
protected int
protected boolean
protected long
Return the size of serialized key and serialized value in bytes across the entire dataset.protected boolean
hasReportedFailure
(DataWriterTaskTracker dataWriterTaskTracker, boolean isDuplicateKeyAllowed) protected boolean
protected void
void
processValuesForKey
(byte[] key, Iterator<byte[]> values, DataWriterTaskTracker dataWriterTaskTracker) protected void
protected void
setExceedQuota
(boolean exceedQuota) protected void
setVeniceWriter
(AbstractVeniceWriter veniceWriter) protected void
Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
configure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled
-
Constructor Details
-
AbstractPartitionWriter
public AbstractPartitionWriter()
-
-
Method Details
-
processValuesForKey
public void processValuesForKey(byte[] key, Iterator<byte[]> values, DataWriterTaskTracker dataWriterTaskTracker) -
setVeniceWriterFactory
-
getDataWriterTaskTracker
-
getCallback
-
getDerivedValueSchemaId
protected int getDerivedValueSchemaId() -
isEnableWriteCompute
protected boolean isEnableWriteCompute() -
extract
protected AbstractPartitionWriter.VeniceWriterMessage extract(byte[] keyBytes, Iterator<byte[]> values, DataWriterTaskTracker dataWriterTaskTracker) -
hasReportedFailure
protected boolean hasReportedFailure(DataWriterTaskTracker dataWriterTaskTracker, boolean isDuplicateKeyAllowed) -
createBasicVeniceWriter
-
getExceedQuotaFlag
protected boolean getExceedQuotaFlag() -
close
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
initDuplicateKeyPrinter
protected AbstractPartitionWriter.DuplicateKeyPrinter initDuplicateKeyPrinter(VeniceProperties props) -
configureTask
Description copied from class:AbstractDataWriterTask
Allow implementations of this class to configure task-specific stuff.- Specified by:
configureTask
in classAbstractDataWriterTask
- Parameters:
props
- the job props that the task was configured with.
-
getTotalIncomingDataSizeInBytes
protected long getTotalIncomingDataSizeInBytes()Return the size of serialized key and serialized value in bytes across the entire dataset. This is an optimization to skip writing the data to Kafka and reduce the load on Kafka and Venice storage nodes. Not all engines can support fetching this information during the execution of the job (eg Spark), but we can live with it for now. The quota is checked again in the Driver after the completion of the DataWriter job, and it will kill the VenicePushJob soon after.- Returns:
- the size of serialized key and serialized value in bytes across the entire dataset
-
recordMessageErrored
-
logMessageProgress
protected void logMessageProgress() -
setVeniceWriter
-
setExceedQuota
protected void setExceedQuota(boolean exceedQuota)
-