java.lang.Object

com.linkedin.venice.hadoop.VenicePushJob

All Implemented Interfaces:: AutoCloseable

public class VenicePushJob extends Object implements AutoCloseable

This class sets up the Hadoop job used to push data to Venice. The job reads the input data off HDFS. It supports 2 kinds of input -- Avro / Binary Json (Vson).

Field Summary

Fields

Modifier and Type

Field

Description

protected final org.apache.hadoop.mapred.JobConf

validateSchemaAndBuildDictJobConf
Constructor Summary

Constructors

Constructor

Description

VenicePushJob(String jobId, Properties vanillaProps)
Method Summary

Modifier and Type

Method

Description

void

cancel()

A cancel method for graceful cancellation of the running Job to be invoked as a result of user actions.

void

close()

protected InputDataInfoProvider

constructInputDataInfoProvider()

protected static boolean

evaluateCompressionMetricCollectionEnabled(PushJobSetting pushJobSetting, boolean inputFileHasRecords)

This functions evaluates the config PushJobSetting.compressionMetricCollectionEnabled based on the input data and other configs as an initial filter to disable this config for cases where we won't be able to collect this information or where it doesn't make sense to collect this information.

String

getIncrementalPushVersion()

protected InputDataInfoProvider

getInputDataInfoProvider()

protected String

getInputURI(VeniceProperties props)

Get input path from the properties; Check whether there is sub-directory in the input directory

VeniceProperties

getJobProperties()

String

getKafkaUrl()

protected static PushJobDetailsStatus

getPerColoPushJobDetailsStatusFromExecutionStatus(ExecutionStatus executionStatus)

Transform per colo ExecutionStatus to per colo PushJobDetailsStatus

PushJobSetting

getPushJobSetting()

String

getTopicToMonitor()

protected static String

getValidateSchemaAndBuildDictionaryOutputFileName(String mrJobId)

protected static String

getValidateSchemaAndBuildDictionaryOutputFileNameNoExtension(String mrJobId)

protected ValidateSchemaAndBuildDictMapperOutputReader

getValidateSchemaAndBuildDictMapperOutputReader(org.apache.hadoop.fs.Path outputDir, String fileName)

protected void

initKIFRepushDetails()

static void

main(String[] args)

void

run()

protected void

setControllerClient(ControllerClient controllerClient)

protected void

setInputDataInfoProvider(InputDataInfoProvider inputDataInfoProvider)

void

setJobClientWrapper(JobClientWrapper jobClientWrapper)

void

setSentPushJobDetailsTracker(SentPushJobDetailsTracker sentPushJobDetailsTracker)

protected void

setupInputFormatConfToValidateSchemaAndBuildDict(org.apache.hadoop.mapred.JobConf conf, PushJobSetting pushJobSetting, String inputDirectory)

protected void

setValidateSchemaAndBuildDictMapperOutputReader(ValidateSchemaAndBuildDictMapperOutputReader validateSchemaAndBuildDictMapperOutputReader)

protected void

setVeniceWriter(VeniceWriter<KafkaKey,byte[],byte[]> veniceWriter)

protected static boolean

shouldBuildZstdCompressionDictionary(PushJobSetting pushJobSetting, boolean inputFileHasRecords)

This functions decides whether Zstd compression dictionary needs to be trained or not, based on the type of push, configs and whether there are any input records or not, or whether PushJobSetting.compressionMetricCollectionEnabled is enabled or not.

protected void

validateRemoteHybridSettings()

protected void

validateRemoteHybridSettings(PushJobSetting setting)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- validateSchemaAndBuildDictJobConf
  
  protected final org.apache.hadoop.mapred.JobConf validateSchemaAndBuildDictJobConf
Constructor Details
- VenicePushJob
  
  public VenicePushJob(String jobId, Properties vanillaProps)
  
  Parameters:
  
  jobId - id of the job
  
  vanillaProps - Property bag for the job
Method Details
- getPushJobSetting
  
  public PushJobSetting getPushJobSetting()
- getJobProperties
  
  public VeniceProperties getJobProperties()
- setControllerClient
  
  protected void setControllerClient(ControllerClient controllerClient)
- setJobClientWrapper
  
  public void setJobClientWrapper(JobClientWrapper jobClientWrapper)
- setInputDataInfoProvider
  
  protected void setInputDataInfoProvider(InputDataInfoProvider inputDataInfoProvider)
- setVeniceWriter
  
  protected void setVeniceWriter(VeniceWriter<KafkaKey,byte[],byte[]> veniceWriter)
- setSentPushJobDetailsTracker
  
  public void setSentPushJobDetailsTracker(SentPushJobDetailsTracker sentPushJobDetailsTracker)
- setValidateSchemaAndBuildDictMapperOutputReader
  
  protected void setValidateSchemaAndBuildDictMapperOutputReader(ValidateSchemaAndBuildDictMapperOutputReader validateSchemaAndBuildDictMapperOutputReader)
- run
  
  public void run()
  
  Throws:
  
  VeniceException
- getValidateSchemaAndBuildDictionaryOutputFileNameNoExtension
  
  protected static String getValidateSchemaAndBuildDictionaryOutputFileNameNoExtension(String mrJobId)
- getValidateSchemaAndBuildDictionaryOutputFileName
  
  protected static String getValidateSchemaAndBuildDictionaryOutputFileName(String mrJobId)
- shouldBuildZstdCompressionDictionary
  
  protected static boolean shouldBuildZstdCompressionDictionary(PushJobSetting pushJobSetting, boolean inputFileHasRecords)
  
  This functions decides whether Zstd compression dictionary needs to be trained or not, based on the type of push, configs and whether there are any input records or not, or whether PushJobSetting.compressionMetricCollectionEnabled is enabled or not.
- evaluateCompressionMetricCollectionEnabled
  
  protected static boolean evaluateCompressionMetricCollectionEnabled(PushJobSetting pushJobSetting, boolean inputFileHasRecords)
  
  This functions evaluates the config PushJobSetting.compressionMetricCollectionEnabled based on the input data and other configs as an initial filter to disable this config for cases where we won't be able to collect this information or where it doesn't make sense to collect this information. eg: When there are no data or for Incremental push.
- constructInputDataInfoProvider
  
  protected InputDataInfoProvider constructInputDataInfoProvider()
- getInputDataInfoProvider
  
  protected InputDataInfoProvider getInputDataInfoProvider()
- getValidateSchemaAndBuildDictMapperOutputReader
  
  protected ValidateSchemaAndBuildDictMapperOutputReader getValidateSchemaAndBuildDictMapperOutputReader(org.apache.hadoop.fs.Path outputDir, String fileName) throws Exception
  
  Throws:
  
  Exception
- initKIFRepushDetails
  
  protected void initKIFRepushDetails()
- getInputURI
  
  protected String getInputURI(VeniceProperties props)
  
  Get input path from the properties; Check whether there is sub-directory in the input directory
  
  Parameters:
  
  props -
  
  Returns:
  
  input URI
  
  Throws:
  
  Exception
- getPerColoPushJobDetailsStatusFromExecutionStatus
  
  protected static PushJobDetailsStatus getPerColoPushJobDetailsStatusFromExecutionStatus(ExecutionStatus executionStatus)
  
  Transform per colo ExecutionStatus to per colo PushJobDetailsStatus
- validateRemoteHybridSettings
  
  protected void validateRemoteHybridSettings()
- validateRemoteHybridSettings
  
  protected void validateRemoteHybridSettings(PushJobSetting setting)
- setupInputFormatConfToValidateSchemaAndBuildDict
  
  protected void setupInputFormatConfToValidateSchemaAndBuildDict(org.apache.hadoop.mapred.JobConf conf, PushJobSetting pushJobSetting, String inputDirectory)
- cancel
  
  public void cancel()
  
  A cancel method for graceful cancellation of the running Job to be invoked as a result of user actions.
  
  Throws:
  
  Exception
- getKafkaUrl
  
  public String getKafkaUrl()
- getIncrementalPushVersion
  
  public String getIncrementalPushVersion()
- getTopicToMonitor
  
  public String getTopicToMonitor()
- close
  
  public void close()
  
  Specified by:
  
  close in interface AutoCloseable
- main
  
  public static void main(String[] args)

Class VenicePushJob

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

validateSchemaAndBuildDictJobConf

Constructor Details

VenicePushJob

Method Details

getPushJobSetting

getJobProperties

setControllerClient

setJobClientWrapper

setInputDataInfoProvider

setVeniceWriter

setSentPushJobDetailsTracker

setValidateSchemaAndBuildDictMapperOutputReader

run

getValidateSchemaAndBuildDictionaryOutputFileNameNoExtension

getValidateSchemaAndBuildDictionaryOutputFileName

shouldBuildZstdCompressionDictionary

evaluateCompressionMetricCollectionEnabled

constructInputDataInfoProvider

getInputDataInfoProvider

getValidateSchemaAndBuildDictMapperOutputReader

initKIFRepushDetails

getInputURI

getPerColoPushJobDetailsStatusFromExecutionStatus

validateRemoteHybridSettings

validateRemoteHybridSettings

setupInputFormatConfToValidateSchemaAndBuildDict

cancel

getKafkaUrl

getIncrementalPushVersion

getTopicToMonitor

close

main