Class VeniceFileInputFormat

  • All Implemented Interfaces:
    org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,​org.apache.hadoop.io.NullWritable>

    public class VeniceFileInputFormat
    extends java.lang.Object
    implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,​org.apache.hadoop.io.NullWritable>
    Custom Input Format with the following specs to be used for the feature PushJobSetting.useMapperToBuildDict with ValidateSchemaAndBuildDictMapper 1. Only 1 split for the input directory => Only 1 Mapper 2. Each file inside the split (i.e. input directory) is considered to be a separate record: n files => n records 3. Add a sentinel record at the end to build dictionary if needed: n files => n+1 records
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,​org.apache.hadoop.io.NullWritable> getRecordReader​(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)  
      org.apache.hadoop.mapred.InputSplit[] getSplits​(org.apache.hadoop.mapred.JobConf job, int numSplits)
      Number of splits is set to be always 1: which will invoke only 1 mapper to handle all the files.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • VeniceFileInputFormat

        public VeniceFileInputFormat()
    • Method Detail

      • getSplits

        public org.apache.hadoop.mapred.InputSplit[] getSplits​(org.apache.hadoop.mapred.JobConf job,
                                                               int numSplits)
                                                        throws java.io.IOException
        Number of splits is set to be always 1: which will invoke only 1 mapper to handle all the files.
        Specified by:
        getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,​org.apache.hadoop.io.NullWritable>
        Parameters:
        job - MR Job configuration
        numSplits - not used in this function.
        Returns:
        Detail of the 1 split
        Throws:
        java.io.IOException
      • getRecordReader

        public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,​org.apache.hadoop.io.NullWritable> getRecordReader​(org.apache.hadoop.mapred.InputSplit split,
                                                                                                                                               org.apache.hadoop.mapred.JobConf job,
                                                                                                                                               org.apache.hadoop.mapred.Reporter reporter)
                                                                                                                                        throws java.io.IOException
        Specified by:
        getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,​org.apache.hadoop.io.NullWritable>
        Throws:
        java.io.IOException