Class AbstractPartitionWriter

java.lang.Object
com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
com.linkedin.venice.hadoop.task.datawriter.AbstractPartitionWriter
All Implemented Interfaces:
Closeable, AutoCloseable
Direct Known Subclasses:
SparkPartitionWriter, VeniceReducer

public abstract class AbstractPartitionWriter extends AbstractDataWriterTask implements Closeable
An abstraction of the task that processes all key/value pairs, checks for duplicates and emits the final key/value pairs to Venice's PubSub.
  • Constructor Details

    • AbstractPartitionWriter

      public AbstractPartitionWriter()
  • Method Details

    • processValuesForKey

      public void processValuesForKey(byte[] key, Iterator<byte[]> values, DataWriterTaskTracker dataWriterTaskTracker)
    • setVeniceWriterFactory

      protected void setVeniceWriterFactory(VeniceWriterFactory factory)
    • getDataWriterTaskTracker

      protected DataWriterTaskTracker getDataWriterTaskTracker()
    • getCallback

      protected PubSubProducerCallback getCallback()
    • getDerivedValueSchemaId

      protected int getDerivedValueSchemaId()
    • isEnableWriteCompute

      protected boolean isEnableWriteCompute()
    • extract

      protected AbstractPartitionWriter.VeniceWriterMessage extract(byte[] keyBytes, Iterator<byte[]> values, DataWriterTaskTracker dataWriterTaskTracker)
    • hasReportedFailure

      protected boolean hasReportedFailure(DataWriterTaskTracker dataWriterTaskTracker, boolean isDuplicateKeyAllowed)
    • createBasicVeniceWriter

      protected AbstractVeniceWriter<byte[],byte[],byte[]> createBasicVeniceWriter()
    • getExceedQuotaFlag

      protected boolean getExceedQuotaFlag()
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException
    • initDuplicateKeyPrinter

      protected AbstractPartitionWriter.DuplicateKeyPrinter initDuplicateKeyPrinter(VeniceProperties props)
    • configureTask

      protected void configureTask(VeniceProperties props)
      Description copied from class: AbstractDataWriterTask
      Allow implementations of this class to configure task-specific stuff.
      Specified by:
      configureTask in class AbstractDataWriterTask
      Parameters:
      props - the job props that the task was configured with.
    • getTotalIncomingDataSizeInBytes

      protected long getTotalIncomingDataSizeInBytes()
      Return the size of serialized key and serialized value in bytes across the entire dataset. This is an optimization to skip writing the data to Kafka and reduce the load on Kafka and Venice storage nodes. Not all engines can support fetching this information during the execution of the job (eg Spark), but we can live with it for now. The quota is checked again in the Driver after the completion of the DataWriter job, and it will kill the VenicePushJob soon after.
      Returns:
      the size of serialized key and serialized value in bytes across the entire dataset
    • recordMessageErrored

      protected void recordMessageErrored(Exception e)
    • logMessageProgress

      protected void logMessageProgress()
    • setVeniceWriter

      protected void setVeniceWriter(AbstractVeniceWriter veniceWriter)
    • setExceedQuota

      protected void setExceedQuota(boolean exceedQuota)