Package com.linkedin.venice.hadoop
Class ValidateSchemaAndBuildDictMapper
- java.lang.Object
-
- com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
-
- com.linkedin.venice.hadoop.ValidateSchemaAndBuildDictMapper
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,org.apache.hadoop.io.Closeable
,org.apache.hadoop.mapred.JobConfigurable
,org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable,org.apache.avro.mapred.AvroWrapper<org.apache.avro.specific.SpecificRecord>,org.apache.hadoop.io.NullWritable>
public class ValidateSchemaAndBuildDictMapper extends AbstractDataWriterTask implements org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable,org.apache.avro.mapred.AvroWrapper<org.apache.avro.specific.SpecificRecord>,org.apache.hadoop.io.NullWritable>, org.apache.hadoop.mapred.JobConfigurable
Mapper only MR to Validate Schema, Build compression dictionary if needed and persist some data (total file size and compression dictionary) in HDFS to be used by the VPJ Driver Note: processing all the files in this split are done sequentially and if it results in significant increase in the mapper time or resulting in timeouts, this needs to be revisited to be done via a thread pool. TODO: Ideally, this class should not extendAbstractDataWriterTask
. Soon, this MR job is going to be removed and the handling will be moved to the VPJ driver.
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
hasReportedFailure
protected InputDataInfoProvider.InputDataInfo
inputDataInfo
protected DefaultInputDataInfoProvider
inputDataInfoProvider
protected java.lang.String
inputDirectory
protected java.lang.Long
inputFileDataSize
protected boolean
isZstdDictCreationRequired
protected PushJobSetting
pushJobSetting
-
Fields inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
TASK_ID_NOT_SET
-
-
Constructor Summary
Constructors Constructor Description ValidateSchemaAndBuildDictMapper()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
checkLastModificationTimeAndLogError(java.lang.Exception e, java.lang.String errorString)
protected void
checkLastModificationTimeAndLogError(java.lang.Exception exception, java.lang.String errorString, org.apache.hadoop.mapred.Reporter reporter)
protected void
checkLastModificationTimeAndLogError(java.lang.String errorString, org.apache.hadoop.mapred.Reporter reporter)
void
close()
void
configure(org.apache.hadoop.mapred.JobConf job)
protected void
configureTask(VeniceProperties props)
Allow implementations of this class to configure task-specific stuff.protected void
initInputData(VeniceProperties props)
void
map(org.apache.hadoop.io.IntWritable inputKey, org.apache.hadoop.io.NullWritable inputValue, org.apache.hadoop.mapred.OutputCollector<org.apache.avro.mapred.AvroWrapper<org.apache.avro.specific.SpecificRecord>,org.apache.hadoop.io.NullWritable> output, org.apache.hadoop.mapred.Reporter reporter)
protected boolean
processInput(int fileIdx, org.apache.hadoop.mapred.Reporter reporter)
1.-
Methods inherited from class com.linkedin.venice.hadoop.task.datawriter.AbstractDataWriterTask
configure, getEngineTaskConfigProvider, getPartitionCount, getTaskId, isChunkingEnabled, isRmdChunkingEnabled, setChunkingEnabled
-
-
-
-
Field Detail
-
inputDataInfoProvider
protected DefaultInputDataInfoProvider inputDataInfoProvider
-
inputDataInfo
protected InputDataInfoProvider.InputDataInfo inputDataInfo
-
pushJobSetting
protected PushJobSetting pushJobSetting
-
isZstdDictCreationRequired
protected boolean isZstdDictCreationRequired
-
hasReportedFailure
protected boolean hasReportedFailure
-
inputDirectory
protected java.lang.String inputDirectory
-
inputFileDataSize
protected java.lang.Long inputFileDataSize
-
-
Method Detail
-
map
public void map(org.apache.hadoop.io.IntWritable inputKey, org.apache.hadoop.io.NullWritable inputValue, org.apache.hadoop.mapred.OutputCollector<org.apache.avro.mapred.AvroWrapper<org.apache.avro.specific.SpecificRecord>,org.apache.hadoop.io.NullWritable> output, org.apache.hadoop.mapred.Reporter reporter) throws java.io.IOException
- Specified by:
map
in interfaceorg.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable,org.apache.avro.mapred.AvroWrapper<org.apache.avro.specific.SpecificRecord>,org.apache.hadoop.io.NullWritable>
- Throws:
java.io.IOException
-
processInput
protected boolean processInput(int fileIdx, org.apache.hadoop.mapred.Reporter reporter) throws java.io.IOException
1. validate this file's schema against the first file's schema 2. Calculates total file size 3. Collect sample for dictionary from this file if enabled- Throws:
java.io.IOException
-
checkLastModificationTimeAndLogError
protected void checkLastModificationTimeAndLogError(java.lang.Exception e, java.lang.String errorString) throws java.io.IOException
- Throws:
java.io.IOException
-
checkLastModificationTimeAndLogError
protected void checkLastModificationTimeAndLogError(java.lang.String errorString, org.apache.hadoop.mapred.Reporter reporter) throws java.io.IOException
- Throws:
java.io.IOException
-
checkLastModificationTimeAndLogError
protected void checkLastModificationTimeAndLogError(java.lang.Exception exception, java.lang.String errorString, org.apache.hadoop.mapred.Reporter reporter) throws java.io.IOException
- Throws:
java.io.IOException
-
initInputData
protected void initInputData(VeniceProperties props) throws java.lang.Exception
- Throws:
java.lang.Exception
-
configureTask
protected void configureTask(VeniceProperties props)
Description copied from class:AbstractDataWriterTask
Allow implementations of this class to configure task-specific stuff.- Specified by:
configureTask
in classAbstractDataWriterTask
- Parameters:
props
- the job props that the task was configured with.
-
configure
public void configure(org.apache.hadoop.mapred.JobConf job)
- Specified by:
configure
in interfaceorg.apache.hadoop.mapred.JobConfigurable
-
close
public void close()
- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
-
-