Class AbstractInputRecordProcessor<INPUT_KEY,INPUT_VALUE>
java.lang.Object
com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
com.linkedin.venice.hadoop.task.datawriter.AbstractInputRecordProcessor<INPUT_KEY,INPUT_VALUE>
- Type Parameters:
INPUT_KEY- type of the input key read from InputFormatINPUT_VALUE- type of the input value read from InputFormat
- All Implemented Interfaces:
Closeable,AutoCloseable
- Direct Known Subclasses:
AbstractVeniceMapper,SparkInputRecordProcessor
public abstract class AbstractInputRecordProcessor<INPUT_KEY,INPUT_VALUE>
extends AbstractDataWriterTask
implements Closeable
An abstraction of the task that processes each record from the input, and returns serialized, and potentially
compressed, Avro key/value pairs.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final byte[]protected AbstractVeniceRecordReader<INPUT_KEY,INPUT_VALUE> Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
TASK_ID_NOT_SET -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()protected voidconfigureTask(VeniceProperties props) Allow implementations of this class to configure task-specific stuff.protected abstract AbstractVeniceRecordReader<INPUT_KEY,INPUT_VALUE> getRecordReader(VeniceProperties props) A method for child classes to setupveniceRecordReader.protected booleanprocess(INPUT_KEY inputKey, INPUT_VALUE inputValue, byte[] rmd, AtomicReference<byte[]> keyRef, AtomicReference<byte[]> valueRef, AtomicReference<byte[]> rmdRef, DataWriterTaskTracker dataWriterTaskTracker) This function compresses the record and checks whether its uncompressed size exceeds the maximum allowed size.protected final voidprocessRecord(INPUT_KEY inputKey, INPUT_VALUE inputValue, byte[] inputRmd, TriConsumer<byte[], byte[], byte[]> recordEmitter, DataWriterTaskTracker dataWriterTaskTracker) protected ByteBufferreadDictionaryFromKafka(String topicName, VeniceProperties props) This function is added to allow it to be mocked for tests.Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
configure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled
-
Field Details
-
EMPTY_BYTES
public static final byte[] EMPTY_BYTES -
veniceRecordReader
-
-
Constructor Details
-
AbstractInputRecordProcessor
public AbstractInputRecordProcessor()
-
-
Method Details
-
processRecord
protected final void processRecord(INPUT_KEY inputKey, INPUT_VALUE inputValue, byte[] inputRmd, TriConsumer<byte[], byte[], byte[]> recordEmitter, DataWriterTaskTracker dataWriterTaskTracker) -
process
protected boolean process(INPUT_KEY inputKey, INPUT_VALUE inputValue, byte[] rmd, AtomicReference<byte[]> keyRef, AtomicReference<byte[]> valueRef, AtomicReference<byte[]> rmdRef, DataWriterTaskTracker dataWriterTaskTracker) This function compresses the record and checks whether its uncompressed size exceeds the maximum allowed size. Regardless of the configuration, it tracks uncompressed record size violations in theDataWriterTaskTracker. IfenableUncompressedMaxRecordSizeLimitis enabled, any record that exceeds the limit will be dropped from further processing.The metrics collected by this function will be exposed in the PushJobDetails system store. Downstream, the
trackUncompressedRecordTooLargeFailuremetric is used to verify that the job does not violate the maximum uncompressed record size constraint.If
trackUncompressedRecordTooLargeFailureis non-zero andenableUncompressedMaxRecordSizeLimitis enabled, the job will throw aVeniceExceptioninVenicePushJob.runJobAndUpdateStatus(), using the output ofVenicePushJob.updatePushJobDetailsWithJobDetails(DataWriterTaskTracker).When
enableUncompressedMaxRecordSizeLimitis enabled, no records will be produced to Kafka inAbstractPartitionWriter#processValuesForKey(byte[], Iterator, Iterator, DataWriterTaskTracker). -
getRecordReader
protected abstract AbstractVeniceRecordReader<INPUT_KEY,INPUT_VALUE> getRecordReader(VeniceProperties props) A method for child classes to setupveniceRecordReader. -
configureTask
Description copied from class:AbstractDataWriterTaskAllow implementations of this class to configure task-specific stuff.- Specified by:
configureTaskin classAbstractDataWriterTask- Parameters:
props- the job props that the task was configured with.
-
readDictionaryFromKafka
This function is added to allow it to be mocked for tests. Since mocking this function of an actual object inAbstractTestVeniceMapper#getMapper(int, int, Consumer)ended up hitting the original function always, added an override for this inTestVeniceAvroMapperClass. -
close
public void close()- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable
-