Package com.linkedin.venice.hadoop
Class VeniceFileInputFormat
- java.lang.Object
-
- com.linkedin.venice.hadoop.VeniceFileInputFormat
-
- All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>
public class VeniceFileInputFormat extends java.lang.Object implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>
Custom Input Format with the following specs to be used for the featurePushJobSetting.useMapperToBuildDict
withValidateSchemaAndBuildDictMapper
1. Only 1 split for the input directory => Only 1 Mapper 2. Each file inside the split (i.e. input directory) is considered to be a separate record: n files => n records 3. Add a sentinel record at the end to build dictionary if needed: n files => n+1 records
-
-
Constructor Summary
Constructors Constructor Description VeniceFileInputFormat()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>
getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)
org.apache.hadoop.mapred.InputSplit[]
getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)
Number of splits is set to be always 1: which will invoke only 1 mapper to handle all the files.
-
-
-
Method Detail
-
getSplits
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits) throws java.io.IOException
Number of splits is set to be always 1: which will invoke only 1 mapper to handle all the files.- Specified by:
getSplits
in interfaceorg.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>
- Parameters:
job
- MR Job configurationnumSplits
- not used in this function.- Returns:
- Detail of the 1 split
- Throws:
java.io.IOException
-
getRecordReader
public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter) throws java.io.IOException
- Specified by:
getRecordReader
in interfaceorg.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>
- Throws:
java.io.IOException
-
-