com.linkedin.venice.hadoop.VeniceFileInputFormat

All Implemented Interfaces:: org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>

public class VeniceFileInputFormat extends Object implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>

Custom Input Format with the following specs to be used for the feature PushJobSetting.useMapperToBuildDict with ValidateSchemaAndBuildDictMapper 1. Only 1 split for the input directory => Only 1 Mapper 2. Each file inside the split (i.e. input directory) is considered to be a separate record: n files => n records 3. Add a sentinel record at the end to build dictionary if needed: n files => n+1 records

Constructor Summary

Constructors

Constructor

Description

VeniceFileInputFormat()
Method Summary

Modifier and Type

Method

Description

org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>

getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)

org.apache.hadoop.mapred.InputSplit[]

getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)

Number of splits is set to be always 1: which will invoke only 1 mapper to handle all the files.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- VeniceFileInputFormat
  
  public VeniceFileInputFormat()
Method Details
- getSplits
  
  public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits) throws IOException
  
  Number of splits is set to be always 1: which will invoke only 1 mapper to handle all the files.
  
  Specified by:
  
  getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>
  
  Parameters:
  
  job - MR Job configuration
  
  numSplits - not used in this function.
  
  Returns:
  
  Detail of the 1 split
  
  Throws:
  
  IOException
- getRecordReader
  
  public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter) throws IOException
  
  Specified by:
  
  getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>
  
  Throws:
  
  IOException

Class VeniceFileInputFormat

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

VeniceFileInputFormat

Method Details

getSplits

getRecordReader