Class SparkPartitionWriter
- java.lang.Object
-
- com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
-
- com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
-
- com.linkedin.venice.spark.datawriter.writer.SparkPartitionWriter
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
public class SparkPartitionWriter extends AbstractPartitionWriter
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
AbstractPartitionWriter.DuplicateKeyPrinter, AbstractPartitionWriter.PartitionWriterProducerCallback, AbstractPartitionWriter.VeniceWriterMessage
-
-
Field Summary
-
Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
TASK_ID_NOT_SET
-
-
Constructor Summary
Constructors Constructor Description SparkPartitionWriter(java.util.Properties jobProperties, DataWriterAccumulators accumulators)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected long
getTotalIncomingDataSizeInBytes()
Return the size of serialized key and serialized value in bytes across the entire dataset.-
Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
close, configureTask, extract, getCallback, getDataWriterTaskTracker, getDerivedValueSchemaId, getExceedQuotaFlag, hasReportedFailure, initDuplicateKeyPrinter, isEnableWriteCompute, logMessageProgress, processValuesForKey, recordMessageErrored, setExceedQuota, setVeniceWriter
-
Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
configure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled
-
-
-
-
Constructor Detail
-
SparkPartitionWriter
public SparkPartitionWriter(java.util.Properties jobProperties, DataWriterAccumulators accumulators)
-
-
Method Detail
-
getTotalIncomingDataSizeInBytes
protected long getTotalIncomingDataSizeInBytes()
Description copied from class:AbstractPartitionWriter
Return the size of serialized key and serialized value in bytes across the entire dataset. This is an optimization to skip writing the data to Kafka and reduce the load on Kafka and Venice storage nodes. Not all engines can support fetching this information during the execution of the job (eg Spark), but we can live with it for now. The quota is checked again in the Driver after the completion of the DataWriter job, and it will kill the VenicePushJob soon after.- Overrides:
getTotalIncomingDataSizeInBytes
in classAbstractPartitionWriter
- Returns:
- the size of serialized key and serialized value in bytes across the entire dataset
-
-