Class VenicePushJob

java.lang.Object
com.linkedin.venice.hadoop.VenicePushJob
All Implemented Interfaces:
AutoCloseable

public class VenicePushJob extends Object implements AutoCloseable
This class sets up the Hadoop job used to push data to Venice. The job reads the input data off HDFS. It supports 2 kinds of input -- Avro / Binary Json (Vson).
  • Field Details

    • validateSchemaAndBuildDictJobConf

      protected final org.apache.hadoop.mapred.JobConf validateSchemaAndBuildDictJobConf
  • Constructor Details

    • VenicePushJob

      public VenicePushJob(String jobId, Properties vanillaProps)
      Parameters:
      jobId - id of the job
      vanillaProps - Property bag for the job
  • Method Details

    • getPushJobSetting

      public PushJobSetting getPushJobSetting()
    • getJobProperties

      public VeniceProperties getJobProperties()
    • setControllerClient

      protected void setControllerClient(ControllerClient controllerClient)
    • setJobClientWrapper

      public void setJobClientWrapper(JobClientWrapper jobClientWrapper)
    • setInputDataInfoProvider

      protected void setInputDataInfoProvider(InputDataInfoProvider inputDataInfoProvider)
    • setVeniceWriter

      protected void setVeniceWriter(VeniceWriter<KafkaKey,byte[],byte[]> veniceWriter)
    • setSentPushJobDetailsTracker

      public void setSentPushJobDetailsTracker(SentPushJobDetailsTracker sentPushJobDetailsTracker)
    • setValidateSchemaAndBuildDictMapperOutputReader

      protected void setValidateSchemaAndBuildDictMapperOutputReader(ValidateSchemaAndBuildDictMapperOutputReader validateSchemaAndBuildDictMapperOutputReader)
    • run

      public void run()
      Throws:
      VeniceException
    • getValidateSchemaAndBuildDictionaryOutputFileNameNoExtension

      protected static String getValidateSchemaAndBuildDictionaryOutputFileNameNoExtension(String mrJobId)
    • getValidateSchemaAndBuildDictionaryOutputFileName

      protected static String getValidateSchemaAndBuildDictionaryOutputFileName(String mrJobId)
    • shouldBuildZstdCompressionDictionary

      protected static boolean shouldBuildZstdCompressionDictionary(PushJobSetting pushJobSetting, boolean inputFileHasRecords)
      This functions decides whether Zstd compression dictionary needs to be trained or not, based on the type of push, configs and whether there are any input records or not, or whether PushJobSetting.compressionMetricCollectionEnabled is enabled or not.
    • evaluateCompressionMetricCollectionEnabled

      protected static boolean evaluateCompressionMetricCollectionEnabled(PushJobSetting pushJobSetting, boolean inputFileHasRecords)
      This functions evaluates the config PushJobSetting.compressionMetricCollectionEnabled based on the input data and other configs as an initial filter to disable this config for cases where we won't be able to collect this information or where it doesn't make sense to collect this information. eg: When there are no data or for Incremental push.
    • constructInputDataInfoProvider

      protected InputDataInfoProvider constructInputDataInfoProvider()
    • getInputDataInfoProvider

      protected InputDataInfoProvider getInputDataInfoProvider()
    • getValidateSchemaAndBuildDictMapperOutputReader

      protected ValidateSchemaAndBuildDictMapperOutputReader getValidateSchemaAndBuildDictMapperOutputReader(org.apache.hadoop.fs.Path outputDir, String fileName) throws Exception
      Throws:
      Exception
    • initKIFRepushDetails

      protected void initKIFRepushDetails()
    • getInputURI

      protected String getInputURI(VeniceProperties props)
      Get input path from the properties; Check whether there is sub-directory in the input directory
      Parameters:
      props -
      Returns:
      input URI
      Throws:
      Exception
    • getPerColoPushJobDetailsStatusFromExecutionStatus

      protected static PushJobDetailsStatus getPerColoPushJobDetailsStatusFromExecutionStatus(ExecutionStatus executionStatus)
      Transform per colo ExecutionStatus to per colo PushJobDetailsStatus
    • validateRemoteHybridSettings

      protected void validateRemoteHybridSettings()
    • validateRemoteHybridSettings

      protected void validateRemoteHybridSettings(PushJobSetting setting)
    • setupInputFormatConfToValidateSchemaAndBuildDict

      protected void setupInputFormatConfToValidateSchemaAndBuildDict(org.apache.hadoop.mapred.JobConf conf, PushJobSetting pushJobSetting, String inputDirectory)
    • cancel

      public void cancel()
      A cancel method for graceful cancellation of the running Job to be invoked as a result of user actions.
      Throws:
      Exception
    • getKafkaUrl

      public String getKafkaUrl()
    • getIncrementalPushVersion

      public String getIncrementalPushVersion()
    • getTopicToMonitor

      public String getTopicToMonitor()
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
    • main

      public static void main(String[] args)