Package com.linkedin.venice.hadoop
Class VenicePushJob
java.lang.Object
com.linkedin.venice.hadoop.VenicePushJob
- All Implemented Interfaces:
AutoCloseable
This class sets up the Hadoop job used to push data to Venice.
The job reads the input data off HDFS. It supports 2 kinds of
input -- Avro / Binary Json (Vson).
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidcancel()A cancel method for graceful cancellation of the running Job to be invoked as a result of user actions or due to the job exceeding bootstrapToOnlineTimeoutInHours.voidclose()protected InputDataInfoProviderprotected static booleanevaluateCompressionMetricCollectionEnabled(PushJobSetting pushJobSetting, boolean inputFileHasRecords) This functions evaluates the configPushJobSetting.compressionMetricCollectionEnabledbased on the input data and other configs as an initial filter to disable this config for cases where we won't be able to collect this information or where it doesn't make sense to collect this information.protected InputDataInfoProviderprotected StringgetInputURI(VeniceProperties props) Get input path from the properties; Check whether there is sub-directory in the input directoryprotected static PushJobDetailsStatusgetPerColoPushJobDetailsStatusFromExecutionStatus(ExecutionStatus executionStatus) Transform per coloExecutionStatusto per coloPushJobDetailsStatusprotected voidstatic voidvoidrun()protected voidsetControllerClient(ControllerClient controllerClient) protected voidsetInputDataInfoProvider(InputDataInfoProvider inputDataInfoProvider) voidsetJobClientWrapper(JobClientWrapper jobClientWrapper) voidsetSentPushJobDetailsTracker(SentPushJobDetailsTracker sentPushJobDetailsTracker) protected voidsetVeniceWriter(VeniceWriter<KafkaKey, byte[], byte[]> veniceWriter) protected static booleanshouldBuildZstdCompressionDictionary(PushJobSetting pushJobSetting, boolean inputFileHasRecords) This functions decides whether Zstd compression dictionary needs to be trained or not, based on the type of push, configs and whether there are any input records or not, or whetherPushJobSetting.compressionMetricCollectionEnabledis enabled or not.protected voidprotected void
-
Constructor Details
-
VenicePushJob
- Parameters:
jobId- id of the jobvanillaProps- Property bag for the job
-
-
Method Details
-
getPushJobSetting
-
getJobProperties
-
setControllerClient
-
setJobClientWrapper
-
setInputDataInfoProvider
-
setVeniceWriter
-
setSentPushJobDetailsTracker
-
run
public void run()- Throws:
VeniceException
-
shouldBuildZstdCompressionDictionary
protected static boolean shouldBuildZstdCompressionDictionary(PushJobSetting pushJobSetting, boolean inputFileHasRecords) This functions decides whether Zstd compression dictionary needs to be trained or not, based on the type of push, configs and whether there are any input records or not, or whetherPushJobSetting.compressionMetricCollectionEnabledis enabled or not. -
evaluateCompressionMetricCollectionEnabled
protected static boolean evaluateCompressionMetricCollectionEnabled(PushJobSetting pushJobSetting, boolean inputFileHasRecords) This functions evaluates the configPushJobSetting.compressionMetricCollectionEnabledbased on the input data and other configs as an initial filter to disable this config for cases where we won't be able to collect this information or where it doesn't make sense to collect this information. eg: When there are no data or for Incremental push. -
constructInputDataInfoProvider
-
getInputDataInfoProvider
-
initKIFRepushDetails
protected void initKIFRepushDetails() -
getInputURI
Get input path from the properties; Check whether there is sub-directory in the input directory- Parameters:
props-- Returns:
- input URI
- Throws:
Exception
-
getPerColoPushJobDetailsStatusFromExecutionStatus
protected static PushJobDetailsStatus getPerColoPushJobDetailsStatusFromExecutionStatus(ExecutionStatus executionStatus) Transform per coloExecutionStatusto per coloPushJobDetailsStatus -
validateRemoteHybridSettings
protected void validateRemoteHybridSettings() -
validateRemoteHybridSettings
-
cancel
public void cancel()A cancel method for graceful cancellation of the running Job to be invoked as a result of user actions or due to the job exceeding bootstrapToOnlineTimeoutInHours. -
getKafkaUrl
-
getIncrementalPushVersion
-
getTopicToMonitor
-
close
public void close()- Specified by:
closein interfaceAutoCloseable
-
main
-