com.linkedin.venice.spark.datawriter.writer.SparkPartitionWriter

All Implemented Interfaces:: Closeable, AutoCloseable

public class SparkPartitionWriter extends AbstractPartitionWriter

Nested Class Summary

Nested classes/interfaces inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
AbstractPartitionWriter.ChildWriterProducerCallback, AbstractPartitionWriter.DuplicateKeyPrinter, AbstractPartitionWriter.PartitionWriterProducerCallback, AbstractPartitionWriter.VeniceWriterMessage
Field Summary

Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
TASK_ID_NOT_SET
Constructor Summary

Constructors

Constructor

Description

SparkPartitionWriter(Properties jobProperties, DataWriterAccumulators accumulators)
Method Summary

Modifier and Type

Method

Description

protected long

getTotalIncomingDataSizeInBytes()

Return the size of serialized key and serialized value in bytes across the entire dataset.

Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
close, configureTask, createBasicVeniceWriter, extract, getCallback, getDataWriterTaskTracker, getDerivedValueSchemaId, getExceedQuotaFlag, getVeniceWriterFactory, hasReportedFailure, initDuplicateKeyPrinter, isEnableWriteCompute, logMessageProgress, processValuesForKey, recordMessageErrored, setExceedQuota, setVeniceWriter, setVeniceWriterFactory

Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
configure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- SparkPartitionWriter
  
  public SparkPartitionWriter(Properties jobProperties, DataWriterAccumulators accumulators)
Method Details
- getTotalIncomingDataSizeInBytes
  
  protected long getTotalIncomingDataSizeInBytes()
  
  Description copied from class: AbstractPartitionWriter
  
  Return the size of serialized key and serialized value in bytes across the entire dataset. This is an optimization to skip writing the data to Kafka and reduce the load on Kafka and Venice storage nodes. Not all engines can support fetching this information during the execution of the job (eg Spark), but we can live with it for now. The quota is checked again in the Driver after the completion of the DataWriter job, and it will kill the VenicePushJob soon after.
  
  Overrides:
  
  getTotalIncomingDataSizeInBytes in class AbstractPartitionWriter
  
  Returns:
  
  the size of serialized key and serialized value in bytes across the entire dataset

Class SparkPartitionWriter

Nested Class Summary

Nested classes/interfaces inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter

Field Summary

Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask

Constructor Summary

Method Summary

Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter

Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask

Methods inherited from class java.lang.Object

Constructor Details

SparkPartitionWriter

Method Details

getTotalIncomingDataSizeInBytes