java.lang.Object

com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask

com.linkedin.venice.hadoop.task.datawriter.AbstractInputRecordProcessor<INPUT_KEY,INPUT_VALUE>

Type Parameters:: INPUT_KEY - type of the input key read from InputFormat; INPUT_VALUE - type of the input value read from InputFormat

All Implemented Interfaces:: Closeable, AutoCloseable

Direct Known Subclasses:: AbstractVeniceMapper, SparkInputRecordProcessor

public abstract class AbstractInputRecordProcessor<INPUT_KEY,INPUT_VALUE> extends AbstractDataWriterTask implements Closeable

An abstraction of the task that processes each record from the input, and returns serialized, and potentially compressed, Avro key/value pairs.

Field Summary

Fields

Modifier and Type

Field

Description

protected AbstractVeniceRecordReader<INPUT_KEY,INPUT_VALUE>

veniceRecordReader

Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
TASK_ID_NOT_SET
Constructor Summary

Constructors

Constructor

Description

AbstractInputRecordProcessor()
Method Summary

Modifier and Type

Method

Description

void

close()

protected void

configureTask(VeniceProperties props)

Allow implementations of this class to configure task-specific stuff.

protected abstract AbstractVeniceRecordReader<INPUT_KEY,INPUT_VALUE>

getRecordReader(VeniceProperties props)

A method for child classes to setup veniceRecordReader.

protected boolean

process(INPUT_KEY inputKey, INPUT_VALUE inputValue, Long timestamp, AtomicReference<byte[]> keyRef, AtomicReference<byte[]> valueRef, AtomicReference<Long> timestampRef, DataWriterTaskTracker dataWriterTaskTracker)

This function compresses the record and checks whether its uncompressed size exceeds the maximum allowed size.

protected final void

processRecord(INPUT_KEY inputKey, INPUT_VALUE inputValue, Long timestamp, TriConsumer<byte[],byte[],Long> recordEmitter, DataWriterTaskTracker dataWriterTaskTracker)

protected ByteBuffer

readDictionaryFromKafka(String topicName, VeniceProperties props)

This function is added to allow it to be mocked for tests.

Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
configure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- veniceRecordReader
  
  protected AbstractVeniceRecordReader<INPUT_KEY,INPUT_VALUE> veniceRecordReader
Constructor Details
- AbstractInputRecordProcessor
  
  public AbstractInputRecordProcessor()
Method Details
- processRecord
  
  protected final void processRecord(INPUT_KEY inputKey, INPUT_VALUE inputValue, Long timestamp, TriConsumer<byte[],byte[],Long> recordEmitter, DataWriterTaskTracker dataWriterTaskTracker)
- process
  
  protected boolean process(INPUT_KEY inputKey, INPUT_VALUE inputValue, Long timestamp, AtomicReference<byte[]> keyRef, AtomicReference<byte[]> valueRef, AtomicReference<Long> timestampRef, DataWriterTaskTracker dataWriterTaskTracker)
  
  This function compresses the record and checks whether its uncompressed size exceeds the maximum allowed size. Regardless of the configuration, it tracks uncompressed record size violations in the DataWriterTaskTracker. If enableUncompressedMaxRecordSizeLimit is enabled, any record that exceeds the limit will be dropped from further processing.
  The metrics collected by this function will be exposed in the PushJobDetails system store. Downstream, the trackUncompressedRecordTooLargeFailure metric is used to verify that the job does not violate the maximum uncompressed record size constraint.
  If trackUncompressedRecordTooLargeFailure is non-zero and enableUncompressedMaxRecordSizeLimit is enabled, the job will throw a VeniceException in VenicePushJob.runJobAndUpdateStatus(), using the output of VenicePushJob.updatePushJobDetailsWithJobDetails(DataWriterTaskTracker).
  When enableUncompressedMaxRecordSizeLimit is enabled, no records will be produced to Kafka in AbstractPartitionWriter.processValuesForKey(byte[], Iterator, Iterator, DataWriterTaskTracker).
- getRecordReader
  
  protected abstract AbstractVeniceRecordReader<INPUT_KEY,INPUT_VALUE> getRecordReader(VeniceProperties props)
  
  A method for child classes to setup veniceRecordReader.
- configureTask
  
  protected void configureTask(VeniceProperties props)
  
  Description copied from class: AbstractDataWriterTask
  
  Allow implementations of this class to configure task-specific stuff.
  
  Specified by:
  
  configureTask in class AbstractDataWriterTask
  
  Parameters:
  
  props - the job props that the task was configured with.
- readDictionaryFromKafka
  
  protected ByteBuffer readDictionaryFromKafka(String topicName, VeniceProperties props)
  
  This function is added to allow it to be mocked for tests. Since mocking this function of an actual object in AbstractTestVeniceMapper#getMapper(int, int, Consumer) ended up hitting the original function always, added an override for this in TestVeniceAvroMapperClass.
- close
  
  public void close()
  
  Specified by:
  
  close in interface AutoCloseable
  
  Specified by:
  
  close in interface Closeable

Class AbstractInputRecordProcessor<INPUT_KEY,INPUT_VALUE>

Field Summary

Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask

Constructor Summary

Method Summary

Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask

Methods inherited from class java.lang.Object

Field Details

veniceRecordReader

Constructor Details

AbstractInputRecordProcessor

Method Details

processRecord

process

getRecordReader

configureTask

readDictionaryFromKafka

close