Package com.linkedin.venice.hadoop
Class VenicePushJob
java.lang.Object
com.linkedin.venice.hadoop.VenicePushJob
- All Implemented Interfaces:
AutoCloseable
This class sets up the Hadoop job used to push data to Venice.
The job reads the input data off HDFS. It supports 2 kinds of
input -- Avro / Binary Json (Vson).
-
Field Summary
Modifier and TypeFieldDescriptionprotected final org.apache.hadoop.mapred.JobConf
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
cancel()
A cancel method for graceful cancellation of the running Job to be invoked as a result of user actions.void
close()
protected InputDataInfoProvider
protected static boolean
evaluateCompressionMetricCollectionEnabled
(PushJobSetting pushJobSetting, boolean inputFileHasRecords) This functions evaluates the configPushJobSetting.compressionMetricCollectionEnabled
based on the input data and other configs as an initial filter to disable this config for cases where we won't be able to collect this information or where it doesn't make sense to collect this information.protected InputDataInfoProvider
protected String
getInputURI
(VeniceProperties props) Get input path from the properties; Check whether there is sub-directory in the input directoryprotected static PushJobDetailsStatus
getPerColoPushJobDetailsStatusFromExecutionStatus
(ExecutionStatus executionStatus) Transform per coloExecutionStatus
to per coloPushJobDetailsStatus
protected static String
protected static String
getValidateSchemaAndBuildDictMapperOutputReader
(org.apache.hadoop.fs.Path outputDir, String fileName) protected void
static void
void
run()
protected void
setControllerClient
(ControllerClient controllerClient) protected void
setInputDataInfoProvider
(InputDataInfoProvider inputDataInfoProvider) void
setJobClientWrapper
(JobClientWrapper jobClientWrapper) void
setSentPushJobDetailsTracker
(SentPushJobDetailsTracker sentPushJobDetailsTracker) protected void
setupInputFormatConfToValidateSchemaAndBuildDict
(org.apache.hadoop.mapred.JobConf conf, PushJobSetting pushJobSetting, String inputDirectory) protected void
setValidateSchemaAndBuildDictMapperOutputReader
(ValidateSchemaAndBuildDictMapperOutputReader validateSchemaAndBuildDictMapperOutputReader) protected void
setVeniceWriter
(VeniceWriter<KafkaKey, byte[], byte[]> veniceWriter) protected static boolean
shouldBuildZstdCompressionDictionary
(PushJobSetting pushJobSetting, boolean inputFileHasRecords) This functions decides whether Zstd compression dictionary needs to be trained or not, based on the type of push, configs and whether there are any input records or not, or whetherPushJobSetting.compressionMetricCollectionEnabled
is enabled or not.protected void
protected void
-
Field Details
-
validateSchemaAndBuildDictJobConf
protected final org.apache.hadoop.mapred.JobConf validateSchemaAndBuildDictJobConf
-
-
Constructor Details
-
VenicePushJob
- Parameters:
jobId
- id of the jobvanillaProps
- Property bag for the job
-
-
Method Details
-
getPushJobSetting
-
getJobProperties
-
setControllerClient
-
setJobClientWrapper
-
setInputDataInfoProvider
-
setVeniceWriter
-
setSentPushJobDetailsTracker
-
setValidateSchemaAndBuildDictMapperOutputReader
protected void setValidateSchemaAndBuildDictMapperOutputReader(ValidateSchemaAndBuildDictMapperOutputReader validateSchemaAndBuildDictMapperOutputReader) -
run
public void run()- Throws:
VeniceException
-
getValidateSchemaAndBuildDictionaryOutputFileNameNoExtension
-
getValidateSchemaAndBuildDictionaryOutputFileName
-
shouldBuildZstdCompressionDictionary
protected static boolean shouldBuildZstdCompressionDictionary(PushJobSetting pushJobSetting, boolean inputFileHasRecords) This functions decides whether Zstd compression dictionary needs to be trained or not, based on the type of push, configs and whether there are any input records or not, or whetherPushJobSetting.compressionMetricCollectionEnabled
is enabled or not. -
evaluateCompressionMetricCollectionEnabled
protected static boolean evaluateCompressionMetricCollectionEnabled(PushJobSetting pushJobSetting, boolean inputFileHasRecords) This functions evaluates the configPushJobSetting.compressionMetricCollectionEnabled
based on the input data and other configs as an initial filter to disable this config for cases where we won't be able to collect this information or where it doesn't make sense to collect this information. eg: When there are no data or for Incremental push. -
constructInputDataInfoProvider
-
getInputDataInfoProvider
-
getValidateSchemaAndBuildDictMapperOutputReader
protected ValidateSchemaAndBuildDictMapperOutputReader getValidateSchemaAndBuildDictMapperOutputReader(org.apache.hadoop.fs.Path outputDir, String fileName) throws Exception - Throws:
Exception
-
initKIFRepushDetails
protected void initKIFRepushDetails() -
getInputURI
Get input path from the properties; Check whether there is sub-directory in the input directory- Parameters:
props
-- Returns:
- input URI
- Throws:
Exception
-
getPerColoPushJobDetailsStatusFromExecutionStatus
protected static PushJobDetailsStatus getPerColoPushJobDetailsStatusFromExecutionStatus(ExecutionStatus executionStatus) Transform per coloExecutionStatus
to per coloPushJobDetailsStatus
-
validateRemoteHybridSettings
protected void validateRemoteHybridSettings() -
validateRemoteHybridSettings
-
setupInputFormatConfToValidateSchemaAndBuildDict
protected void setupInputFormatConfToValidateSchemaAndBuildDict(org.apache.hadoop.mapred.JobConf conf, PushJobSetting pushJobSetting, String inputDirectory) -
cancel
public void cancel()A cancel method for graceful cancellation of the running Job to be invoked as a result of user actions.- Throws:
Exception
-
getKafkaUrl
-
getIncrementalPushVersion
-
getTopicToMonitor
-
close
public void close()- Specified by:
close
in interfaceAutoCloseable
-
main
-