Class AbstractInputRecordProcessor<INPUT_KEY,INPUT_VALUE>
java.lang.Object
com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
com.linkedin.venice.hadoop.task.datawriter.AbstractInputRecordProcessor<INPUT_KEY,INPUT_VALUE>
- Type Parameters:
INPUT_KEY
- type of the input key read from InputFormatINPUT_VALUE
- type of the input value read from InputFormat
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
AbstractVeniceMapper
,SparkInputRecordProcessor
public abstract class AbstractInputRecordProcessor<INPUT_KEY,INPUT_VALUE>
extends AbstractDataWriterTask
implements Closeable
An abstraction of the task that processes each record from the input, and returns serialized, and potentially
compressed, Avro key/value pairs.
-
Field Summary
FieldsFields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
TASK_ID_NOT_SET
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
protected void
configureTask
(VeniceProperties props) Allow implementations of this class to configure task-specific stuff.protected abstract AbstractVeniceRecordReader<INPUT_KEY,
INPUT_VALUE> getRecordReader
(VeniceProperties props) A method for child classes to setupveniceRecordReader
.protected boolean
process
(INPUT_KEY inputKey, INPUT_VALUE inputValue, Long timestamp, AtomicReference<byte[]> keyRef, AtomicReference<byte[]> valueRef, AtomicReference<Long> timestampRef, DataWriterTaskTracker dataWriterTaskTracker) This function compresses the record and checks whether its uncompressed size exceeds the maximum allowed size.protected final void
processRecord
(INPUT_KEY inputKey, INPUT_VALUE inputValue, Long timestamp, TriConsumer<byte[], byte[], Long> recordEmitter, DataWriterTaskTracker dataWriterTaskTracker) protected ByteBuffer
readDictionaryFromKafka
(String topicName, VeniceProperties props) This function is added to allow it to be mocked for tests.Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
configure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled
-
Field Details
-
veniceRecordReader
-
-
Constructor Details
-
AbstractInputRecordProcessor
public AbstractInputRecordProcessor()
-
-
Method Details
-
processRecord
protected final void processRecord(INPUT_KEY inputKey, INPUT_VALUE inputValue, Long timestamp, TriConsumer<byte[], byte[], Long> recordEmitter, DataWriterTaskTracker dataWriterTaskTracker) -
process
protected boolean process(INPUT_KEY inputKey, INPUT_VALUE inputValue, Long timestamp, AtomicReference<byte[]> keyRef, AtomicReference<byte[]> valueRef, AtomicReference<Long> timestampRef, DataWriterTaskTracker dataWriterTaskTracker) This function compresses the record and checks whether its uncompressed size exceeds the maximum allowed size. Regardless of the configuration, it tracks uncompressed record size violations in theDataWriterTaskTracker
. IfenableUncompressedMaxRecordSizeLimit
is enabled, any record that exceeds the limit will be dropped from further processing.The metrics collected by this function will be exposed in the PushJobDetails system store. Downstream, the
trackUncompressedRecordTooLargeFailure
metric is used to verify that the job does not violate the maximum uncompressed record size constraint.If
trackUncompressedRecordTooLargeFailure
is non-zero andenableUncompressedMaxRecordSizeLimit
is enabled, the job will throw aVeniceException
inVenicePushJob.runJobAndUpdateStatus()
, using the output ofVenicePushJob.updatePushJobDetailsWithJobDetails(DataWriterTaskTracker)
.When
enableUncompressedMaxRecordSizeLimit
is enabled, no records will be produced to Kafka inAbstractPartitionWriter.processValuesForKey(byte[], Iterator, Iterator, DataWriterTaskTracker)
. -
getRecordReader
protected abstract AbstractVeniceRecordReader<INPUT_KEY,INPUT_VALUE> getRecordReader(VeniceProperties props) A method for child classes to setupveniceRecordReader
. -
configureTask
Description copied from class:AbstractDataWriterTask
Allow implementations of this class to configure task-specific stuff.- Specified by:
configureTask
in classAbstractDataWriterTask
- Parameters:
props
- the job props that the task was configured with.
-
readDictionaryFromKafka
This function is added to allow it to be mocked for tests. Since mocking this function of an actual object inAbstractTestVeniceMapper#getMapper(int, int, Consumer)
ended up hitting the original function always, added an override for this inTestVeniceAvroMapperClass
. -
close
public void close()- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
-