Package com.linkedin.venice.hadoop
Class VenicePushJob
java.lang.Object
com.linkedin.venice.hadoop.VenicePushJob
- All Implemented Interfaces:
- AutoCloseable
This class sets up the Hadoop job used to push data to Venice.
 The job reads the input data off HDFS. It supports 2 kinds of
 input -- Avro / Binary Json (Vson).
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionvoidcancel()A cancel method for graceful cancellation of the running Job to be invoked as a result of user actions or due to the job exceeding bootstrapToOnlineTimeoutInHours.voidclose()protected InputDataInfoProviderprotected static booleanevaluateCompressionMetricCollectionEnabled(PushJobSetting pushJobSetting, boolean inputFileHasRecords) This functions evaluates the configPushJobSetting.compressionMetricCollectionEnabledbased on the input data and other configs as an initial filter to disable this config for cases where we won't be able to collect this information or where it doesn't make sense to collect this information.protected InputDataInfoProviderprotected StringgetInputURI(VeniceProperties props) Get input path from the properties; Check whether there is sub-directory in the input directoryprotected static PushJobDetailsStatusgetPerColoPushJobDetailsStatusFromExecutionStatus(ExecutionStatus executionStatus) Transform per coloExecutionStatusto per coloPushJobDetailsStatusprotected voidstatic voidvoidrun()protected voidsetControllerClient(ControllerClient controllerClient) protected voidsetInputDataInfoProvider(InputDataInfoProvider inputDataInfoProvider) voidsetJobClientWrapper(JobClientWrapper jobClientWrapper) voidsetSentPushJobDetailsTracker(SentPushJobDetailsTracker sentPushJobDetailsTracker) protected voidsetVeniceWriter(VeniceWriter<KafkaKey, byte[], byte[]> veniceWriter) protected static booleanshouldBuildZstdCompressionDictionary(PushJobSetting pushJobSetting, boolean inputFileHasRecords) This functions decides whether Zstd compression dictionary needs to be trained or not, based on the type of push, configs and whether there are any input records or not, or whetherPushJobSetting.compressionMetricCollectionEnabledis enabled or not.protected voidprotected void
- 
Constructor Details- 
VenicePushJob- Parameters:
- jobId- id of the job
- vanillaProps- Property bag for the job
 
 
- 
- 
Method Details- 
getPushJobSetting
- 
getJobProperties
- 
setControllerClient
- 
setJobClientWrapper
- 
setInputDataInfoProvider
- 
setVeniceWriter
- 
setSentPushJobDetailsTracker
- 
runpublic void run()- Throws:
- VeniceException
 
- 
shouldBuildZstdCompressionDictionaryprotected static boolean shouldBuildZstdCompressionDictionary(PushJobSetting pushJobSetting, boolean inputFileHasRecords) This functions decides whether Zstd compression dictionary needs to be trained or not, based on the type of push, configs and whether there are any input records or not, or whetherPushJobSetting.compressionMetricCollectionEnabledis enabled or not.
- 
evaluateCompressionMetricCollectionEnabledprotected static boolean evaluateCompressionMetricCollectionEnabled(PushJobSetting pushJobSetting, boolean inputFileHasRecords) This functions evaluates the configPushJobSetting.compressionMetricCollectionEnabledbased on the input data and other configs as an initial filter to disable this config for cases where we won't be able to collect this information or where it doesn't make sense to collect this information. eg: When there are no data or for Incremental push.
- 
constructInputDataInfoProvider
- 
getInputDataInfoProvider
- 
initKIFRepushDetailsprotected void initKIFRepushDetails()
- 
getInputURIGet input path from the properties; Check whether there is sub-directory in the input directory- Parameters:
- props-
- Returns:
- input URI
- Throws:
- Exception
 
- 
getPerColoPushJobDetailsStatusFromExecutionStatusprotected static PushJobDetailsStatus getPerColoPushJobDetailsStatusFromExecutionStatus(ExecutionStatus executionStatus) Transform per coloExecutionStatusto per coloPushJobDetailsStatus
- 
validateRemoteHybridSettingsprotected void validateRemoteHybridSettings()
- 
validateRemoteHybridSettings
- 
cancelpublic void cancel()A cancel method for graceful cancellation of the running Job to be invoked as a result of user actions or due to the job exceeding bootstrapToOnlineTimeoutInHours.
- 
getKafkaUrl
- 
getIncrementalPushVersion
- 
getTopicToMonitor
- 
closepublic void close()- Specified by:
- closein interface- AutoCloseable
 
- 
main
 
-