Class VeniceFileInputFormat

java.lang.Object
com.linkedin.venice.hadoop.VeniceFileInputFormat
All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>

public class VeniceFileInputFormat extends Object implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>
Custom Input Format with the following specs to be used for the feature PushJobSetting.useMapperToBuildDict with ValidateSchemaAndBuildDictMapper 1. Only 1 split for the input directory => Only 1 Mapper 2. Each file inside the split (i.e. input directory) is considered to be a separate record: n files => n records 3. Add a sentinel record at the end to build dictionary if needed: n files => n+1 records
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>
    getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)
     
    org.apache.hadoop.mapred.InputSplit[]
    getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)
    Number of splits is set to be always 1: which will invoke only 1 mapper to handle all the files.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • VeniceFileInputFormat

      public VeniceFileInputFormat()
  • Method Details

    • getSplits

      public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits) throws IOException
      Number of splits is set to be always 1: which will invoke only 1 mapper to handle all the files.
      Specified by:
      getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>
      Parameters:
      job - MR Job configuration
      numSplits - not used in this function.
      Returns:
      Detail of the 1 split
      Throws:
      IOException
    • getRecordReader

      public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter) throws IOException
      Specified by:
      getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.NullWritable>
      Throws:
      IOException