Package com.linkedin.venice.hadoop
Class ValidateSchemaAndBuildDictOutputFormat
- java.lang.Object
-
- org.apache.hadoop.mapred.FileOutputFormat<org.apache.avro.mapred.AvroWrapper<T>,org.apache.hadoop.io.NullWritable>
-
- org.apache.avro.mapred.AvroOutputFormat
-
- com.linkedin.venice.hadoop.ValidateSchemaAndBuildDictOutputFormat
-
- All Implemented Interfaces:
org.apache.hadoop.mapred.OutputFormat
public class ValidateSchemaAndBuildDictOutputFormat extends org.apache.avro.mapred.AvroOutputFormat
This class provides a way to: 1. Reuse the existing output directory and override existing files which throws an exception in the parent class: to keep the outfile path/Name deterministic 2. set custom permissions to the output directory/files to allow only the push job owners can access the personally identifiable information (eg: compressionDictionary) 3. setsFileOutputFormat.setOutputPath(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.fs.Path)
-
-
Constructor Summary
Constructors Constructor Description ValidateSchemaAndBuildDictOutputFormat()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
checkOutputSpecs(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job)
org.apache.hadoop.mapred.RecordWriter
getRecordWriter(org.apache.hadoop.fs.FileSystem ignore, org.apache.hadoop.mapred.JobConf job, java.lang.String name, org.apache.hadoop.util.Progressable prog)
Modify the output file name to be the MR job id to keep it unique.protected static void
setValidateSchemaAndBuildDictionaryOutputDirPath(org.apache.hadoop.mapred.JobConf job)
1.-
Methods inherited from class org.apache.avro.mapred.AvroOutputFormat
setDeflateLevel, setSyncInterval
-
-
-
-
Method Detail
-
setValidateSchemaAndBuildDictionaryOutputDirPath
protected static void setValidateSchemaAndBuildDictionaryOutputDirPath(org.apache.hadoop.mapred.JobConf job)
1. The parent directory should be accessible by every user/group (777) 2. unique sub-directory for this VPJ should be accessible only by the user who triggers it (700) to protect unauthorized access to pii (eg: Zstd compression dictionary)- Parameters:
job
- mapred config- Throws:
java.io.IOException
-
checkOutputSpecs
public void checkOutputSpecs(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job) throws java.io.IOException
- Specified by:
checkOutputSpecs
in interfaceorg.apache.hadoop.mapred.OutputFormat
- Overrides:
checkOutputSpecs
in classorg.apache.hadoop.mapred.FileOutputFormat
- Throws:
java.io.IOException
-
getRecordWriter
public org.apache.hadoop.mapred.RecordWriter getRecordWriter(org.apache.hadoop.fs.FileSystem ignore, org.apache.hadoop.mapred.JobConf job, java.lang.String name, org.apache.hadoop.util.Progressable prog) throws java.io.IOException
Modify the output file name to be the MR job id to keep it unique. No need to explicitly control the permissions for the output file as its parent folder is restricted anyway.- Specified by:
getRecordWriter
in interfaceorg.apache.hadoop.mapred.OutputFormat
- Overrides:
getRecordWriter
in classorg.apache.avro.mapred.AvroOutputFormat
- Throws:
java.io.IOException
-
-