Class ValidateSchemaAndBuildDictOutputFormat

java.lang.Object
org.apache.hadoop.mapred.FileOutputFormat<org.apache.avro.mapred.AvroWrapper<T>,org.apache.hadoop.io.NullWritable>
org.apache.avro.mapred.AvroOutputFormat
com.linkedin.venice.hadoop.ValidateSchemaAndBuildDictOutputFormat
All Implemented Interfaces:
org.apache.hadoop.mapred.OutputFormat

public class ValidateSchemaAndBuildDictOutputFormat extends org.apache.avro.mapred.AvroOutputFormat
This class provides a way to: 1. Reuse the existing output directory and override existing files which throws an exception in the parent class: to keep the outfile path/Name deterministic 2. set custom permissions to the output directory/files to allow only the push job owners can access the personally identifiable information (eg: compressionDictionary) 3. sets FileOutputFormat.setOutputPath(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.fs.Path)
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileOutputFormat

    org.apache.hadoop.mapred.FileOutputFormat.Counter
  • Field Summary

    Fields inherited from class org.apache.avro.mapred.AvroOutputFormat

    DEFLATE_LEVEL_KEY, EXT, SYNC_INTERVAL_KEY, XZ_LEVEL_KEY, ZSTD_BUFFERPOOL_KEY, ZSTD_LEVEL_KEY
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    checkOutputSpecs(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job)
     
    org.apache.hadoop.mapred.RecordWriter
    getRecordWriter(org.apache.hadoop.fs.FileSystem ignore, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable prog)
    Modify the output file name to be the MR job id to keep it unique.
    protected static void
    setValidateSchemaAndBuildDictionaryOutputDirPath(org.apache.hadoop.mapred.JobConf job)
    1.

    Methods inherited from class org.apache.avro.mapred.AvroOutputFormat

    setDeflateLevel, setSyncInterval

    Methods inherited from class org.apache.hadoop.mapred.FileOutputFormat

    getCompressOutput, getOutputCompressorClass, getOutputPath, getPathForCustomFile, getTaskOutputPath, getUniqueName, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputPath, setWorkOutputPath

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • ValidateSchemaAndBuildDictOutputFormat

      public ValidateSchemaAndBuildDictOutputFormat()
  • Method Details

    • setValidateSchemaAndBuildDictionaryOutputDirPath

      protected static void setValidateSchemaAndBuildDictionaryOutputDirPath(org.apache.hadoop.mapred.JobConf job)
      1. The parent directory should be accessible by every user/group (777) 2. unique sub-directory for this VPJ should be accessible only by the user who triggers it (700) to protect unauthorized access to pii (eg: Zstd compression dictionary)
      Parameters:
      job - mapred config
      Throws:
      IOException
    • checkOutputSpecs

      public void checkOutputSpecs(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job) throws IOException
      Specified by:
      checkOutputSpecs in interface org.apache.hadoop.mapred.OutputFormat
      Overrides:
      checkOutputSpecs in class org.apache.hadoop.mapred.FileOutputFormat
      Throws:
      IOException
    • getRecordWriter

      public org.apache.hadoop.mapred.RecordWriter getRecordWriter(org.apache.hadoop.fs.FileSystem ignore, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable prog) throws IOException
      Modify the output file name to be the MR job id to keep it unique. No need to explicitly control the permissions for the output file as its parent folder is restricted anyway.
      Specified by:
      getRecordWriter in interface org.apache.hadoop.mapred.OutputFormat
      Overrides:
      getRecordWriter in class org.apache.avro.mapred.AvroOutputFormat
      Throws:
      IOException