Package com.linkedin.venice.hadoop
Class VenicePushJob
- java.lang.Object
-
- com.linkedin.venice.hadoop.VenicePushJob
-
- All Implemented Interfaces:
java.lang.AutoCloseable
public class VenicePushJob extends java.lang.Object implements java.lang.AutoCloseable
This class sets up the Hadoop job used to push data to Venice. The job reads the input data off HDFS. It supports 2 kinds of input -- Avro / Binary Json (Vson).
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
VenicePushJob.PushJobCheckpoints
Different successful checkpoints and known error scenarios of the VPJ flow.
-
Field Summary
Fields Modifier and Type Field Description protected org.apache.hadoop.mapred.JobConf
validateSchemaAndBuildDictJobConf
-
Constructor Summary
Constructors Constructor Description VenicePushJob(java.lang.String jobId, java.util.Properties vanillaProps)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
cancel()
A cancel method for graceful cancellation of the running Job to be invoked as a result of user actions.void
close()
protected InputDataInfoProvider
constructInputDataInfoProvider()
protected static boolean
evaluateCompressionMetricCollectionEnabled(PushJobSetting pushJobSetting, boolean inputFileHasRecords)
This functions evaluates the configPushJobSetting.compressionMetricCollectionEnabled
based on the input data and other configs as an initial filter to disable this config for cases where we won't be able to collect this information or where it doesn't make sense to collect this information.java.lang.String
getIncrementalPushVersion()
protected InputDataInfoProvider
getInputDataInfoProvider()
protected java.lang.String
getInputURI(VeniceProperties props)
Get input path from the properties; Check whether there is sub-directory in the input directoryVeniceProperties
getJobProperties()
java.lang.String
getKafkaUrl()
protected static PushJobDetailsStatus
getPerColoPushJobDetailsStatusFromExecutionStatus(ExecutionStatus executionStatus)
Transform per coloExecutionStatus
to per coloPushJobDetailsStatus
PushJobSetting
getPushJobSetting()
java.lang.String
getTopicToMonitor()
protected static java.lang.String
getValidateSchemaAndBuildDictionaryOutputFileName(java.lang.String mrJobId)
protected static java.lang.String
getValidateSchemaAndBuildDictionaryOutputFileNameNoExtension(java.lang.String mrJobId)
protected ValidateSchemaAndBuildDictMapperOutputReader
getValidateSchemaAndBuildDictMapperOutputReader(org.apache.hadoop.fs.Path outputDir, java.lang.String fileName)
protected void
initKIFRepushDetails()
static void
main(java.lang.String[] args)
void
run()
protected void
setControllerClient(ControllerClient controllerClient)
protected void
setInputDataInfoProvider(InputDataInfoProvider inputDataInfoProvider)
void
setJobClientWrapper(JobClientWrapper jobClientWrapper)
void
setSentPushJobDetailsTracker(SentPushJobDetailsTracker sentPushJobDetailsTracker)
protected void
setupInputFormatConfToValidateSchemaAndBuildDict(org.apache.hadoop.mapred.JobConf conf, PushJobSetting pushJobSetting, java.lang.String inputDirectory)
protected void
setValidateSchemaAndBuildDictMapperOutputReader(ValidateSchemaAndBuildDictMapperOutputReader validateSchemaAndBuildDictMapperOutputReader)
protected void
setVeniceWriter(VeniceWriter<KafkaKey,byte[],byte[]> veniceWriter)
protected static boolean
shouldBuildZstdCompressionDictionary(PushJobSetting pushJobSetting, boolean inputFileHasRecords)
This functions decides whether Zstd compression dictionary needs to be trained or not, based on the type of push, configs and whether there are any input records or not, or whetherPushJobSetting.compressionMetricCollectionEnabled
is enabled or not.protected void
validateRemoteHybridSettings()
protected void
validateRemoteHybridSettings(PushJobSetting setting)
-
-
-
Method Detail
-
getPushJobSetting
public PushJobSetting getPushJobSetting()
-
getJobProperties
public VeniceProperties getJobProperties()
-
setControllerClient
protected void setControllerClient(ControllerClient controllerClient)
-
setJobClientWrapper
public void setJobClientWrapper(JobClientWrapper jobClientWrapper)
-
setInputDataInfoProvider
protected void setInputDataInfoProvider(InputDataInfoProvider inputDataInfoProvider)
-
setVeniceWriter
protected void setVeniceWriter(VeniceWriter<KafkaKey,byte[],byte[]> veniceWriter)
-
setSentPushJobDetailsTracker
public void setSentPushJobDetailsTracker(SentPushJobDetailsTracker sentPushJobDetailsTracker)
-
setValidateSchemaAndBuildDictMapperOutputReader
protected void setValidateSchemaAndBuildDictMapperOutputReader(ValidateSchemaAndBuildDictMapperOutputReader validateSchemaAndBuildDictMapperOutputReader)
-
run
public void run()
- Throws:
VeniceException
-
getValidateSchemaAndBuildDictionaryOutputFileNameNoExtension
protected static java.lang.String getValidateSchemaAndBuildDictionaryOutputFileNameNoExtension(java.lang.String mrJobId)
-
getValidateSchemaAndBuildDictionaryOutputFileName
protected static java.lang.String getValidateSchemaAndBuildDictionaryOutputFileName(java.lang.String mrJobId)
-
shouldBuildZstdCompressionDictionary
protected static boolean shouldBuildZstdCompressionDictionary(PushJobSetting pushJobSetting, boolean inputFileHasRecords)
This functions decides whether Zstd compression dictionary needs to be trained or not, based on the type of push, configs and whether there are any input records or not, or whetherPushJobSetting.compressionMetricCollectionEnabled
is enabled or not.
-
evaluateCompressionMetricCollectionEnabled
protected static boolean evaluateCompressionMetricCollectionEnabled(PushJobSetting pushJobSetting, boolean inputFileHasRecords)
This functions evaluates the configPushJobSetting.compressionMetricCollectionEnabled
based on the input data and other configs as an initial filter to disable this config for cases where we won't be able to collect this information or where it doesn't make sense to collect this information. eg: When there are no data or for Incremental push.
-
constructInputDataInfoProvider
protected InputDataInfoProvider constructInputDataInfoProvider()
-
getInputDataInfoProvider
protected InputDataInfoProvider getInputDataInfoProvider()
-
getValidateSchemaAndBuildDictMapperOutputReader
protected ValidateSchemaAndBuildDictMapperOutputReader getValidateSchemaAndBuildDictMapperOutputReader(org.apache.hadoop.fs.Path outputDir, java.lang.String fileName) throws java.lang.Exception
- Throws:
java.lang.Exception
-
initKIFRepushDetails
protected void initKIFRepushDetails()
-
getInputURI
protected java.lang.String getInputURI(VeniceProperties props)
Get input path from the properties; Check whether there is sub-directory in the input directory- Parameters:
props
-- Returns:
- input URI
- Throws:
java.lang.Exception
-
getPerColoPushJobDetailsStatusFromExecutionStatus
protected static PushJobDetailsStatus getPerColoPushJobDetailsStatusFromExecutionStatus(ExecutionStatus executionStatus)
Transform per coloExecutionStatus
to per coloPushJobDetailsStatus
-
validateRemoteHybridSettings
protected void validateRemoteHybridSettings()
-
validateRemoteHybridSettings
protected void validateRemoteHybridSettings(PushJobSetting setting)
-
setupInputFormatConfToValidateSchemaAndBuildDict
protected void setupInputFormatConfToValidateSchemaAndBuildDict(org.apache.hadoop.mapred.JobConf conf, PushJobSetting pushJobSetting, java.lang.String inputDirectory)
-
cancel
public void cancel()
A cancel method for graceful cancellation of the running Job to be invoked as a result of user actions.- Throws:
java.lang.Exception
-
getKafkaUrl
public java.lang.String getKafkaUrl()
-
getIncrementalPushVersion
public java.lang.String getIncrementalPushVersion()
-
getTopicToMonitor
public java.lang.String getTopicToMonitor()
-
close
public void close()
- Specified by:
close
in interfacejava.lang.AutoCloseable
-
main
public static void main(java.lang.String[] args)
-
-