Class AbstractInputRecordProcessor<INPUT_KEY,INPUT_VALUE>

java.lang.Object
com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
com.linkedin.venice.hadoop.task.datawriter.AbstractInputRecordProcessor<INPUT_KEY,INPUT_VALUE>
Type Parameters:
INPUT_KEY - type of the input key read from InputFormat
INPUT_VALUE - type of the input value read from InputFormat
All Implemented Interfaces:
Closeable, AutoCloseable
Direct Known Subclasses:
AbstractVeniceMapper, SparkInputRecordProcessor

public abstract class AbstractInputRecordProcessor<INPUT_KEY,INPUT_VALUE> extends AbstractDataWriterTask implements Closeable
An abstraction of the task that processes each record from the input, and returns serialized, and potentially compressed, Avro key/value pairs.
  • Field Details

  • Constructor Details

    • AbstractInputRecordProcessor

      public AbstractInputRecordProcessor()
  • Method Details

    • processRecord

      protected final void processRecord(INPUT_KEY inputKey, INPUT_VALUE inputValue, Long timestamp, TriConsumer<byte[],byte[],Long> recordEmitter, DataWriterTaskTracker dataWriterTaskTracker)
    • process

      protected boolean process(INPUT_KEY inputKey, INPUT_VALUE inputValue, Long timestamp, AtomicReference<byte[]> keyRef, AtomicReference<byte[]> valueRef, AtomicReference<Long> timestampRef, DataWriterTaskTracker dataWriterTaskTracker)
      This function compresses the record and checks whether its uncompressed size exceeds the maximum allowed size. Regardless of the configuration, it tracks uncompressed record size violations in the DataWriterTaskTracker. If enableUncompressedMaxRecordSizeLimit is enabled, any record that exceeds the limit will be dropped from further processing.

      The metrics collected by this function will be exposed in the PushJobDetails system store. Downstream, the trackUncompressedRecordTooLargeFailure metric is used to verify that the job does not violate the maximum uncompressed record size constraint.

      If trackUncompressedRecordTooLargeFailure is non-zero and enableUncompressedMaxRecordSizeLimit is enabled, the job will throw a VeniceException in VenicePushJob.runJobAndUpdateStatus(), using the output of VenicePushJob.updatePushJobDetailsWithJobDetails(DataWriterTaskTracker).

      When enableUncompressedMaxRecordSizeLimit is enabled, no records will be produced to Kafka in AbstractPartitionWriter.processValuesForKey(byte[], Iterator, Iterator, DataWriterTaskTracker).

    • getRecordReader

      protected abstract AbstractVeniceRecordReader<INPUT_KEY,INPUT_VALUE> getRecordReader(VeniceProperties props)
      A method for child classes to setup veniceRecordReader.
    • configureTask

      protected void configureTask(VeniceProperties props)
      Description copied from class: AbstractDataWriterTask
      Allow implementations of this class to configure task-specific stuff.
      Specified by:
      configureTask in class AbstractDataWriterTask
      Parameters:
      props - the job props that the task was configured with.
    • readDictionaryFromKafka

      protected ByteBuffer readDictionaryFromKafka(String topicName, VeniceProperties props)
      This function is added to allow it to be mocked for tests. Since mocking this function of an actual object in AbstractTestVeniceMapper#getMapper(int, int, Consumer) ended up hitting the original function always, added an override for this in TestVeniceAvroMapperClass.
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable