Package com.linkedin.venice.hadoop
Class DefaultInputDataInfoProvider
- java.lang.Object
-
- com.linkedin.venice.hadoop.DefaultInputDataInfoProvider
-
- All Implemented Interfaces:
InputDataInfoProvider
,java.io.Closeable
,java.lang.AutoCloseable
public class DefaultInputDataInfoProvider extends java.lang.Object implements InputDataInfoProvider
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface com.linkedin.venice.hadoop.InputDataInfoProvider
InputDataInfoProvider.InputDataInfo
-
-
Field Summary
Fields Modifier and Type Field Description protected PushJobZstdConfig
pushJobZstdConfig
-
Constructor Summary
Constructors Constructor Description DefaultInputDataInfoProvider(PushJobSetting pushJobSetting, VeniceProperties props)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
org.apache.avro.Schema
extractAvroSubSchema(org.apache.avro.Schema origin, java.lang.String fieldName)
protected Pair<org.apache.avro.Schema,org.apache.avro.Schema>
getAvroFileHeader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, boolean isZstdDictCreationRequired)
long
getInputLastModificationTime(java.lang.String inputUri)
protected Pair<VsonSchema,VsonSchema>
getVsonFileHeader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, boolean isZstdDictCreationRequired)
void
initZstdConfig(int numFiles)
byte[]
trainZstdDictionary()
InputDataInfoProvider.InputDataInfo
validateInputAndGetInfo(java.lang.String inputUri)
1.
-
-
-
Field Detail
-
pushJobZstdConfig
protected PushJobZstdConfig pushJobZstdConfig
-
-
Constructor Detail
-
DefaultInputDataInfoProvider
public DefaultInputDataInfoProvider(PushJobSetting pushJobSetting, VeniceProperties props)
-
-
Method Detail
-
validateInputAndGetInfo
public InputDataInfoProvider.InputDataInfo validateInputAndGetInfo(java.lang.String inputUri) throws java.lang.Exception
1. Check whether it's Vson input or Avro input 2. Check schema consistency; 3. Populate key schema, value schema; 4. Load samples for dictionary compression if enabled- Specified by:
validateInputAndGetInfo
in interfaceInputDataInfoProvider
- Parameters:
inputUri
-- Returns:
- a
InputDataInfoProvider.InputDataInfo
that contains input data information - Throws:
java.lang.Exception
-
initZstdConfig
public void initZstdConfig(int numFiles)
- Specified by:
initZstdConfig
in interfaceInputDataInfoProvider
-
getVsonFileHeader
protected Pair<VsonSchema,VsonSchema> getVsonFileHeader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, boolean isZstdDictCreationRequired)
-
trainZstdDictionary
public byte[] trainZstdDictionary()
- Specified by:
trainZstdDictionary
in interfaceInputDataInfoProvider
-
extractAvroSubSchema
public org.apache.avro.Schema extractAvroSubSchema(org.apache.avro.Schema origin, java.lang.String fieldName)
- Specified by:
extractAvroSubSchema
in interfaceInputDataInfoProvider
-
getInputLastModificationTime
public long getInputLastModificationTime(java.lang.String inputUri) throws java.io.IOException
- Specified by:
getInputLastModificationTime
in interfaceInputDataInfoProvider
- Throws:
java.io.IOException
-
getAvroFileHeader
protected Pair<org.apache.avro.Schema,org.apache.avro.Schema> getAvroFileHeader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, boolean isZstdDictCreationRequired)
-
close
public void close()
- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
-
-