Package com.linkedin.venice.hadoop
Class DefaultInputDataInfoProvider
java.lang.Object
com.linkedin.venice.hadoop.DefaultInputDataInfoProvider
- All Implemented Interfaces:
InputDataInfoProvider
,Closeable
,AutoCloseable
-
Nested Class Summary
Nested classes/interfaces inherited from interface com.linkedin.venice.hadoop.InputDataInfoProvider
InputDataInfoProvider.InputDataInfo
-
Field Summary
-
Constructor Summary
ConstructorDescriptionDefaultInputDataInfoProvider
(PushJobSetting pushJobSetting, VeniceProperties props) -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
org.apache.avro.Schema
extractAvroSubSchema
(org.apache.avro.Schema origin, String fieldName) protected Pair<org.apache.avro.Schema,
org.apache.avro.Schema> getAvroFileHeader
(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, boolean isZstdDictCreationRequired) long
getInputLastModificationTime
(String inputUri) protected Pair<VsonSchema,
VsonSchema> getVsonFileHeader
(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, boolean isZstdDictCreationRequired) void
initZstdConfig
(int numFiles) byte[]
validateInputAndGetInfo
(String inputUri) 1.
-
Field Details
-
pushJobZstdConfig
-
-
Constructor Details
-
DefaultInputDataInfoProvider
-
-
Method Details
-
validateInputAndGetInfo
public InputDataInfoProvider.InputDataInfo validateInputAndGetInfo(String inputUri) throws Exception 1. Check whether it's Vson input or Avro input 2. Check schema consistency; 3. Populate key schema, value schema; 4. Load samples for dictionary compression if enabled- Specified by:
validateInputAndGetInfo
in interfaceInputDataInfoProvider
- Parameters:
inputUri
-- Returns:
- a
InputDataInfoProvider.InputDataInfo
that contains input data information - Throws:
Exception
-
initZstdConfig
public void initZstdConfig(int numFiles) - Specified by:
initZstdConfig
in interfaceInputDataInfoProvider
-
getVsonFileHeader
protected Pair<VsonSchema,VsonSchema> getVsonFileHeader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, boolean isZstdDictCreationRequired) -
trainZstdDictionary
public byte[] trainZstdDictionary()- Specified by:
trainZstdDictionary
in interfaceInputDataInfoProvider
-
extractAvroSubSchema
- Specified by:
extractAvroSubSchema
in interfaceInputDataInfoProvider
-
getInputLastModificationTime
- Specified by:
getInputLastModificationTime
in interfaceInputDataInfoProvider
- Throws:
IOException
-
getAvroFileHeader
protected Pair<org.apache.avro.Schema,org.apache.avro.Schema> getAvroFileHeader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, boolean isZstdDictCreationRequired) -
close
public void close()- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
-