Class AbstractPartitionWriter
java.lang.Object
com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
- All Implemented Interfaces:
Closeable,AutoCloseable
- Direct Known Subclasses:
SparkPartitionWriter,VeniceReducer
An abstraction of the task that processes all key/value pairs, checks for duplicates and emits the final key/value
pairs to Venice's PubSub.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionclassstatic classUsing Avro Json encoder to print duplicate keys in case there are tons of duplicate keys, only print firstAbstractPartitionWriter.DuplicateKeyPrinter.MAX_NUM_OF_LOGof them so that it won't pollute Reducer's log.classstatic classstatic class -
Field Summary
Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
TASK_ID_NOT_SET -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()protected voidconfigureTask(VeniceProperties props) Allow implementations of this class to configure task-specific stuff.protected AbstractVeniceWriter<byte[],byte[], byte[]> extract(byte[] keyBytes, Iterator<AbstractPartitionWriter.VeniceRecordWithMetadata> values, DataWriterTaskTracker dataWriterTaskTracker) protected PubSubProducerCallbackprotected DataWriterTaskTrackerprotected intprotected booleanprotected org.apache.avro.Schemaprotected longReturn the size of serialized key and serialized value in bytes across the entire dataset.protected booleanhasReportedFailure(DataWriterTaskTracker dataWriterTaskTracker, boolean isDuplicateKeyAllowed) protected booleanprotected voidvoidprocessValuesForKey(byte[] key, Iterator<AbstractPartitionWriter.VeniceRecordWithMetadata> values, DataWriterTaskTracker dataWriterTaskTracker) protected voidprotected voidsetExceedQuota(boolean exceedQuota) protected voidsetVeniceWriter(AbstractVeniceWriter veniceWriter) protected voidMethods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
configure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled
-
Constructor Details
-
AbstractPartitionWriter
public AbstractPartitionWriter()
-
-
Method Details
-
processValuesForKey
public void processValuesForKey(byte[] key, Iterator<AbstractPartitionWriter.VeniceRecordWithMetadata> values, DataWriterTaskTracker dataWriterTaskTracker) -
setVeniceWriterFactory
-
getVeniceWriterFactory
-
getDataWriterTaskTracker
-
getCallback
-
getDerivedValueSchemaId
protected int getDerivedValueSchemaId() -
isEnableWriteCompute
protected boolean isEnableWriteCompute() -
getRmdSchema
protected org.apache.avro.Schema getRmdSchema() -
extract
protected AbstractPartitionWriter.VeniceWriterMessage extract(byte[] keyBytes, Iterator<AbstractPartitionWriter.VeniceRecordWithMetadata> values, DataWriterTaskTracker dataWriterTaskTracker) -
hasReportedFailure
protected boolean hasReportedFailure(DataWriterTaskTracker dataWriterTaskTracker, boolean isDuplicateKeyAllowed) -
createBasicVeniceWriter
-
getExceedQuotaFlag
protected boolean getExceedQuotaFlag() -
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException
-
initDuplicateKeyPrinter
protected AbstractPartitionWriter.DuplicateKeyPrinter initDuplicateKeyPrinter(VeniceProperties props) -
configureTask
Description copied from class:AbstractDataWriterTaskAllow implementations of this class to configure task-specific stuff.- Specified by:
configureTaskin classAbstractDataWriterTask- Parameters:
props- the job props that the task was configured with.
-
getTotalIncomingDataSizeInBytes
protected long getTotalIncomingDataSizeInBytes()Return the size of serialized key and serialized value in bytes across the entire dataset. This is an optimization to skip writing the data to Kafka and reduce the load on Kafka and Venice storage nodes. Not all engines can support fetching this information during the execution of the job (eg Spark), but we can live with it for now. The quota is checked again in the Driver after the completion of the DataWriter job, and it will kill the VenicePushJob soon after.- Returns:
- the size of serialized key and serialized value in bytes across the entire dataset
-
recordMessageErrored
-
logMessageProgress
protected void logMessageProgress() -
setVeniceWriter
-
setExceedQuota
protected void setExceedQuota(boolean exceedQuota)
-