Class VenicePushJobConstants


  • public final class VenicePushJobConstants
    extends java.lang.Object
    • Field Detail

      • LEGACY_AVRO_KEY_FIELD_PROP

        public static final java.lang.String LEGACY_AVRO_KEY_FIELD_PROP
        See Also:
        Constant Field Values
      • LEGACY_AVRO_VALUE_FIELD_PROP

        public static final java.lang.String LEGACY_AVRO_VALUE_FIELD_PROP
        See Also:
        Constant Field Values
      • VALUE_FIELD_PROP

        public static final java.lang.String VALUE_FIELD_PROP
        See Also:
        Constant Field Values
      • DEFAULT_KEY_FIELD_PROP

        public static final java.lang.String DEFAULT_KEY_FIELD_PROP
        See Also:
        Constant Field Values
      • DEFAULT_VALUE_FIELD_PROP

        public static final java.lang.String DEFAULT_VALUE_FIELD_PROP
        See Also:
        Constant Field Values
      • SCHEMA_STRING_PROP

        public static final java.lang.String SCHEMA_STRING_PROP
        See Also:
        Constant Field Values
      • KAFKA_SOURCE_KEY_SCHEMA_STRING_PROP

        public static final java.lang.String KAFKA_SOURCE_KEY_SCHEMA_STRING_PROP
        See Also:
        Constant Field Values
      • EXTENDED_SCHEMA_VALIDITY_CHECK_ENABLED

        public static final java.lang.String EXTENDED_SCHEMA_VALIDITY_CHECK_ENABLED
        See Also:
        Constant Field Values
      • DEFAULT_EXTENDED_SCHEMA_VALIDITY_CHECK_ENABLED

        public static final boolean DEFAULT_EXTENDED_SCHEMA_VALIDITY_CHECK_ENABLED
        See Also:
        Constant Field Values
      • UPDATE_SCHEMA_STRING_PROP

        public static final java.lang.String UPDATE_SCHEMA_STRING_PROP
        See Also:
        Constant Field Values
      • SPARK_NATIVE_INPUT_FORMAT_ENABLED

        public static final java.lang.String SPARK_NATIVE_INPUT_FORMAT_ENABLED
        See Also:
        Constant Field Values
      • FILE_VALUE_SCHEMA

        public static final java.lang.String FILE_VALUE_SCHEMA
        See Also:
        Constant Field Values
      • INCREMENTAL_PUSH

        public static final java.lang.String INCREMENTAL_PUSH
        See Also:
        Constant Field Values
      • GENERATE_PARTIAL_UPDATE_RECORD_FROM_INPUT

        public static final java.lang.String GENERATE_PARTIAL_UPDATE_RECORD_FROM_INPUT
        See Also:
        Constant Field Values
      • ALLOW_DUPLICATE_KEY

        public static final java.lang.String ALLOW_DUPLICATE_KEY
        See Also:
        Constant Field Values
      • POLL_STATUS_RETRY_ATTEMPTS

        public static final java.lang.String POLL_STATUS_RETRY_ATTEMPTS
        See Also:
        Constant Field Values
      • CONTROLLER_REQUEST_RETRY_ATTEMPTS

        public static final java.lang.String CONTROLLER_REQUEST_RETRY_ATTEMPTS
        See Also:
        Constant Field Values
      • POLL_JOB_STATUS_INTERVAL_MS

        public static final java.lang.String POLL_JOB_STATUS_INTERVAL_MS
        See Also:
        Constant Field Values
      • JOB_STATUS_IN_UNKNOWN_STATE_TIMEOUT_MS

        public static final java.lang.String JOB_STATUS_IN_UNKNOWN_STATE_TIMEOUT_MS
        See Also:
        Constant Field Values
      • SEND_CONTROL_MESSAGES_DIRECTLY

        public static final java.lang.String SEND_CONTROL_MESSAGES_DIRECTLY
        See Also:
        Constant Field Values
      • ETL_VALUE_SCHEMA_TRANSFORMATION

        public static final java.lang.String ETL_VALUE_SCHEMA_TRANSFORMATION
        See Also:
        Constant Field Values
      • SYSTEM_SCHEMA_READER_ENABLED

        public static final java.lang.String SYSTEM_SCHEMA_READER_ENABLED
        See Also:
        Constant Field Values
      • SYSTEM_SCHEMA_CLUSTER_D2_SERVICE_NAME

        public static final java.lang.String SYSTEM_SCHEMA_CLUSTER_D2_SERVICE_NAME
        See Also:
        Constant Field Values
      • SYSTEM_SCHEMA_CLUSTER_D2_ZK_HOST

        public static final java.lang.String SYSTEM_SCHEMA_CLUSTER_D2_ZK_HOST
        See Also:
        Constant Field Values
      • DEFAULT_COMPRESSION_METRIC_COLLECTION_ENABLED

        public static final boolean DEFAULT_COMPRESSION_METRIC_COLLECTION_ENABLED
        See Also:
        Constant Field Values
      • USE_MAPPER_TO_BUILD_DICTIONARY

        public static final java.lang.String USE_MAPPER_TO_BUILD_DICTIONARY
        Config to enable/disable using mapper to do the below which are currently done in VPJ driver
        1. validate schema,
        2. collect the input data size
        3. build dictionary (if needed: refer VenicePushJob.shouldBuildZstdCompressionDictionary(com.linkedin.venice.hadoop.PushJobSetting, boolean))

        This new mapper was added because the sample collection for Zstd dictionary is currently in-memory and to help play around with the sample size and also to support future enhancements if needed.

        Currently, the plan is to only have it available for MapReduce compute engine and remove it eventually as we remove the MapReduce-related codes.
        See Also:
        Constant Field Values
      • DEFAULT_USE_MAPPER_TO_BUILD_DICTIONARY

        public static final boolean DEFAULT_USE_MAPPER_TO_BUILD_DICTIONARY
        See Also:
        Constant Field Values
      • MINIMUM_NUMBER_OF_SAMPLES_REQUIRED_TO_BUILD_ZSTD_DICTIONARY

        public static final int MINIMUM_NUMBER_OF_SAMPLES_REQUIRED_TO_BUILD_ZSTD_DICTIONARY
        Known zstd lib issue which crashes if the input sample is too small. So adding a preventive check to skip training the dictionary in such cases using a minimum limit of 20. Keeping it simple and hard coding it as if this check doesn't prevent some edge cases then we can disable the feature itself
        See Also:
        Constant Field Values
      • ZSTD_DICTIONARY_CREATION_REQUIRED

        public static final java.lang.String ZSTD_DICTIONARY_CREATION_REQUIRED
        Configs to pass to AbstractVeniceMapper based on the input configs and Dictionary training status
        See Also:
        Constant Field Values
      • ZSTD_DICTIONARY_CREATION_SUCCESS

        public static final java.lang.String ZSTD_DICTIONARY_CREATION_SUCCESS
        See Also:
        Constant Field Values
      • VALIDATE_SCHEMA_AND_BUILD_DICT_MAPPER_OUTPUT_DIRECTORY

        public static final java.lang.String VALIDATE_SCHEMA_AND_BUILD_DICT_MAPPER_OUTPUT_DIRECTORY
        Location and key to store the output of ValidateSchemaAndBuildDictMapper and retrieve it back when USE_MAPPER_TO_BUILD_DICTIONARY is enabled
        See Also:
        Constant Field Values
      • VALIDATE_SCHEMA_AND_BUILD_DICTIONARY_MAPPER_OUTPUT_FILE_PREFIX

        public static final java.lang.String VALIDATE_SCHEMA_AND_BUILD_DICTIONARY_MAPPER_OUTPUT_FILE_PREFIX
        See Also:
        Constant Field Values
      • VALIDATE_SCHEMA_AND_BUILD_DICTIONARY_MAPPER_OUTPUT_FILE_EXTENSION

        public static final java.lang.String VALIDATE_SCHEMA_AND_BUILD_DICTIONARY_MAPPER_OUTPUT_FILE_EXTENSION
        See Also:
        Constant Field Values
      • KEY_ZSTD_COMPRESSION_DICTIONARY

        public static final java.lang.String KEY_ZSTD_COMPRESSION_DICTIONARY
        See Also:
        Constant Field Values
      • KEY_INPUT_FILE_DATA_SIZE

        public static final java.lang.String KEY_INPUT_FILE_DATA_SIZE
        See Also:
        Constant Field Values
      • SOURCE_KAFKA

        public static final java.lang.String SOURCE_KAFKA
        Configs used to enable Kafka Input.
        See Also:
        Constant Field Values
      • KAFKA_INPUT_TOPIC

        public static final java.lang.String KAFKA_INPUT_TOPIC
        TODO: consider to automatically discover the source topic for the specified store. We need to be careful in the following scenarios: 1. Not all the prod colos are using the same current version if the previous push experiences a partial failure. 2. We might want to re-push from a backup version, which should be unlikely.
        See Also:
        Constant Field Values
      • KAFKA_INPUT_FABRIC

        public static final java.lang.String KAFKA_INPUT_FABRIC
        See Also:
        Constant Field Values
      • KAFKA_INPUT_BROKER_URL

        public static final java.lang.String KAFKA_INPUT_BROKER_URL
        See Also:
        Constant Field Values
      • KAFKA_INPUT_MAX_RECORDS_PER_MAPPER

        public static final java.lang.String KAFKA_INPUT_MAX_RECORDS_PER_MAPPER
        See Also:
        Constant Field Values
      • KAFKA_INPUT_COMBINER_ENABLED

        public static final java.lang.String KAFKA_INPUT_COMBINER_ENABLED
        See Also:
        Constant Field Values
      • KAFKA_INPUT_COMPRESSION_BUILD_NEW_DICT_ENABLED

        public static final java.lang.String KAFKA_INPUT_COMPRESSION_BUILD_NEW_DICT_ENABLED
        See Also:
        Constant Field Values
      • KAFKA_INPUT_SOURCE_TOPIC_CHUNKING_ENABLED

        public static final java.lang.String KAFKA_INPUT_SOURCE_TOPIC_CHUNKING_ENABLED
        See Also:
        Constant Field Values
      • REWIND_TIME_IN_SECONDS_OVERRIDE

        public static final java.lang.String REWIND_TIME_IN_SECONDS_OVERRIDE
        Optional. If we want to use a different rewind time from the default store-level rewind time config for Kafka Input re-push, the following property needs to specified explicitly. This property comes to play when the default rewind time configured in store-level is too short or too long. 1. If the default rewind time config is too short (for example 0 or several mins), it could cause data gap with re-push since the push job itself could take several hours, and we would like to make sure the re-pushed version will contain the same dataset as the source version. 2. If the default rewind time config is too long (such as 28 days), it will be a big waste to rewind so much time since the time gap between the source version and the re-push version should be comparable to the re-push time if the whole ingestion pipeline is not lagging. There are some challenges to automatically detect the right rewind time for re-push because of the following reasons: 1. For Venice Aggregate use case, some colo could be lagging behind other prod colos, so if the re-push source is from a fast colo, too short rewind time could cause a data gap in the slower colos. Ideally, it is good to use the slowest colo as the re-push source. 2. For Venice non-Aggregate use case, the ingestion pipeline will include the following several phases: 2.1 Customer's Kafka aggregation and mirroring pipeline to replicate the same data to all prod colos. 2.2 Venice Ingestion pipeline to consume the local real-time topic. We have visibility to 2.2, but not 2.1, so we may need to work with customer to understand how 2.1 can be measured or use a long enough rewind time to mitigate all the potential issues. Make this property available in generic since it should be useful for ETL+VPJ use case as well.
        See Also:
        Constant Field Values
      • REWIND_EPOCH_TIME_IN_SECONDS_OVERRIDE

        public static final java.lang.String REWIND_EPOCH_TIME_IN_SECONDS_OVERRIDE
        A time stamp specified to rewind to before replaying data. This config is ignored if rewind.time.in.seconds.override is provided. This config at time of push will be leveraged to fill in the rewind.time.in.seconds.override by taking System.currentTime - rewind.epoch.time.in.seconds.override and storing the result in rewind.time.in.seconds.override. With this in mind, a push policy of REWIND_FROM_SOP should be used in order to get a behavior that makes sense to a user. A timestamp that is in the future is not valid and will result in an exception.
        See Also:
        Constant Field Values
      • SUPPRESS_END_OF_PUSH_MESSAGE

        public static final java.lang.String SUPPRESS_END_OF_PUSH_MESSAGE
        This config is a boolean which suppresses submitting the end of push message after data has been sent and does not poll for the status of the job to complete. Using this flag means that a user must manually mark the job success or failed.
        See Also:
        Constant Field Values
      • DEFER_VERSION_SWAP

        public static final java.lang.String DEFER_VERSION_SWAP
        This config is a boolean which waits for an external signal to trigger version swap after buffer replay is complete.
        See Also:
        Constant Field Values
      • D2_ZK_HOSTS_PREFIX

        public static final java.lang.String D2_ZK_HOSTS_PREFIX
        This config specifies the prefix for d2 zk hosts config. Configs of type <prefix>.<regionName> are expected to be defined.
        See Also:
        Constant Field Values
      • PARENT_CONTROLLER_REGION_NAME

        public static final java.lang.String PARENT_CONTROLLER_REGION_NAME
        This config specifies the region identifier where parent controller is running
        See Also:
        Constant Field Values
      • REWIND_EPOCH_TIME_BUFFER_IN_SECONDS_OVERRIDE

        public static final java.lang.String REWIND_EPOCH_TIME_BUFFER_IN_SECONDS_OVERRIDE
        Relates to the above argument. An overridable amount of buffer to be applied to the epoch (as the rewind isn't perfectly instantaneous). Defaults to 1 minute.
        See Also:
        Constant Field Values
      • VENICE_DISCOVER_URL_PROP

        public static final java.lang.String VENICE_DISCOVER_URL_PROP
        In single-region mode, this must be a comma-separated list of child controller URLs or d2://<d2ServiceNameForChildController> In multi-region mode, it must be a comma-separated list of parent controller URLs or d2://<d2ServiceNameForParentController>
        See Also:
        Constant Field Values
      • SOURCE_GRID_FABRIC

        public static final java.lang.String SOURCE_GRID_FABRIC
        An identifier of the data center which is used to determine the Kafka URL and child controllers that push jobs communicate with
        See Also:
        Constant Field Values
      • ENABLE_WRITE_COMPUTE

        public static final java.lang.String ENABLE_WRITE_COMPUTE
        See Also:
        Constant Field Values
      • VENICE_STORE_NAME_PROP

        public static final java.lang.String VENICE_STORE_NAME_PROP
        See Also:
        Constant Field Values
      • INPUT_PATH_LAST_MODIFIED_TIME

        public static final java.lang.String INPUT_PATH_LAST_MODIFIED_TIME
        See Also:
        Constant Field Values
      • BATCH_NUM_BYTES_PROP

        public static final java.lang.String BATCH_NUM_BYTES_PROP
        See Also:
        Constant Field Values
      • PATH_FILTER

        public static final org.apache.hadoop.fs.PathFilter PATH_FILTER
        ignore hdfs files with prefix "_" and "."
      • GLOB_FILTER_PATTERN

        public static final java.lang.String GLOB_FILTER_PATTERN
        See Also:
        Constant Field Values
      • PERMISSION_777

        public static final org.apache.hadoop.fs.permission.FsPermission PERMISSION_777
      • PERMISSION_700

        public static final org.apache.hadoop.fs.permission.FsPermission PERMISSION_700
      • VALUE_SCHEMA_ID_PROP

        public static final java.lang.String VALUE_SCHEMA_ID_PROP
        See Also:
        Constant Field Values
      • DERIVED_SCHEMA_ID_PROP

        public static final java.lang.String DERIVED_SCHEMA_ID_PROP
        See Also:
        Constant Field Values
      • HADOOP_VALIDATE_SCHEMA_AND_BUILD_DICT_PREFIX

        public static final java.lang.String HADOOP_VALIDATE_SCHEMA_AND_BUILD_DICT_PREFIX
        See Also:
        Constant Field Values
      • STORAGE_QUOTA_PROP

        public static final java.lang.String STORAGE_QUOTA_PROP
        See Also:
        Constant Field Values
      • STORAGE_ENGINE_OVERHEAD_RATIO

        public static final java.lang.String STORAGE_ENGINE_OVERHEAD_RATIO
        See Also:
        Constant Field Values
      • VSON_PUSH

        @Deprecated
        public static final java.lang.String VSON_PUSH
        Deprecated.
        See Also:
        Constant Field Values
      • KAFKA_SECURITY_PROTOCOL

        public static final java.lang.String KAFKA_SECURITY_PROTOCOL
        See Also:
        Constant Field Values
      • COMPRESSION_STRATEGY

        public static final java.lang.String COMPRESSION_STRATEGY
        See Also:
        Constant Field Values
      • KAFKA_INPUT_SOURCE_COMPRESSION_STRATEGY

        public static final java.lang.String KAFKA_INPUT_SOURCE_COMPRESSION_STRATEGY
        See Also:
        Constant Field Values
      • SSL_CONFIGURATOR_CLASS_CONFIG

        public static final java.lang.String SSL_CONFIGURATOR_CLASS_CONFIG
        See Also:
        Constant Field Values
      • SSL_KEY_STORE_PROPERTY_NAME

        public static final java.lang.String SSL_KEY_STORE_PROPERTY_NAME
        See Also:
        Constant Field Values
      • SSL_TRUST_STORE_PROPERTY_NAME

        public static final java.lang.String SSL_TRUST_STORE_PROPERTY_NAME
        See Also:
        Constant Field Values
      • SSL_KEY_STORE_PASSWORD_PROPERTY_NAME

        public static final java.lang.String SSL_KEY_STORE_PASSWORD_PROPERTY_NAME
        See Also:
        Constant Field Values
      • SSL_KEY_PASSWORD_PROPERTY_NAME

        public static final java.lang.String SSL_KEY_PASSWORD_PROPERTY_NAME
        See Also:
        Constant Field Values
      • JOB_EXEC_URL

        public static final java.lang.String JOB_EXEC_URL
        This will define the execution servers url for easy access to the job execution during debugging. This can be any string that is meaningful to the execution environment.
        See Also:
        Constant Field Values
      • JOB_SERVER_NAME

        public static final java.lang.String JOB_SERVER_NAME
        The short name of the server where the job runs
        See Also:
        Constant Field Values
      • JOB_EXEC_ID

        public static final java.lang.String JOB_EXEC_ID
        The execution ID of the execution if this job is a part of a multi-step flow. Each step in the flow can have the same execution id
        See Also:
        Constant Field Values
      • PUSH_JOB_STATUS_UPLOAD_ENABLE

        public static final java.lang.String PUSH_JOB_STATUS_UPLOAD_ENABLE
        Config to enable the service that uploads push job statuses to the controller using ControllerClient.uploadPushJobStatus(), the job status is then packaged and sent to dedicated Kafka channel.
        See Also:
        Constant Field Values
      • REDUCER_SPECULATIVE_EXECUTION_ENABLE

        public static final java.lang.String REDUCER_SPECULATIVE_EXECUTION_ENABLE
        See Also:
        Constant Field Values
      • TELEMETRY_MESSAGE_INTERVAL

        public static final java.lang.String TELEMETRY_MESSAGE_INTERVAL
        The interval of number of messages upon which certain info is printed in the reducer logs.
        See Also:
        Constant Field Values
      • ZSTD_COMPRESSION_LEVEL

        public static final java.lang.String ZSTD_COMPRESSION_LEVEL
        Config to control the Compression Level for ZSTD Dictionary Compression.
        See Also:
        Constant Field Values
      • DEFAULT_BATCH_BYTES_SIZE

        public static final int DEFAULT_BATCH_BYTES_SIZE
        See Also:
        Constant Field Values
      • DEFAULT_RE_PUSH_REWIND_IN_SECONDS_OVERRIDE

        public static final long DEFAULT_RE_PUSH_REWIND_IN_SECONDS_OVERRIDE
        The rewind override when performing re-push to prevent data loss; if the store has higher rewind config setting than 1 days, adopt the store config instead; otherwise, override the rewind config to 1 day if push job config doesn't try to override it.
        See Also:
        Constant Field Values
      • REPUSH_TTL_ENABLE

        public static final java.lang.String REPUSH_TTL_ENABLE
        Config to control the TTL behaviors in repush.
        See Also:
        Constant Field Values
      • REPUSH_TTL_POLICY

        public static final java.lang.String REPUSH_TTL_POLICY
        See Also:
        Constant Field Values
      • REPUSH_TTL_SECONDS

        public static final java.lang.String REPUSH_TTL_SECONDS
        See Also:
        Constant Field Values
      • REPUSH_TTL_START_TIMESTAMP

        public static final java.lang.String REPUSH_TTL_START_TIMESTAMP
        See Also:
        Constant Field Values
      • VALUE_SCHEMA_DIR

        public static final java.lang.String VALUE_SCHEMA_DIR
        See Also:
        Constant Field Values
      • TARGETED_REGION_PUSH_LIST

        public static final java.lang.String TARGETED_REGION_PUSH_LIST
        This is experimental config to specify a list of regions used for targeted region push in VPJ. TARGETED_REGION_PUSH_ENABLED has to be enabled to use this config. In this mode, the VPJ will only push data to the provided regions. The input is comma separated list of region names, e.g. "dc-0,dc-1,dc-2". For single targeted region push, see TARGETED_REGION_PUSH_ENABLED.
        See Also:
        Constant Field Values
      • TARGETED_REGION_PUSH_WITH_DEFERRED_SWAP

        public static final java.lang.String TARGETED_REGION_PUSH_WITH_DEFERRED_SWAP
        Config to enable target region push with deferred version swap In this mode, the VPJ will push data to all regions and only switch to the new version in a single region. The single region is decided by the store config in StoreInfo.getNativeReplicationSourceFabric()} unless a list of regions is passed in from targeted.region.push.list After a specified wait time (default 1h), the remaining regions will switch to the new version.
        See Also:
        Constant Field Values
      • DEFAULT_IS_DUPLICATED_KEY_ALLOWED

        public static final boolean DEFAULT_IS_DUPLICATED_KEY_ALLOWED
        See Also:
        Constant Field Values
      • MAP_REDUCE_PARTITIONER_CLASS_CONFIG

        public static final java.lang.String MAP_REDUCE_PARTITIONER_CLASS_CONFIG
        Config used only for tests. Should not be used at regular runtime.
        See Also:
        Constant Field Values
      • UNCREATED_VERSION_NUMBER

        public static final int UNCREATED_VERSION_NUMBER
        Placeholder for version number that is yet to be created.
        See Also:
        Constant Field Values
      • DEFAULT_POLL_STATUS_INTERVAL_MS

        public static final long DEFAULT_POLL_STATUS_INTERVAL_MS
        See Also:
        Constant Field Values
      • DEFAULT_JOB_STATUS_IN_UNKNOWN_STATE_TIMEOUT_MS

        public static final long DEFAULT_JOB_STATUS_IN_UNKNOWN_STATE_TIMEOUT_MS
        The default total time we wait before failing a job if the job status stays in UNKNOWN state.
        See Also:
        Constant Field Values
      • NON_CRITICAL_EXCEPTION

        public static final java.lang.String NON_CRITICAL_EXCEPTION
        See Also:
        Constant Field Values
      • COMPRESSION_DICTIONARY_SAMPLE_SIZE

        public static final java.lang.String COMPRESSION_DICTIONARY_SAMPLE_SIZE
        Sample size to collect for building dictionary: Can be assigned a max of 2GB as ZstdDictTrainer in ZSTD library takes in sample size as int
        See Also:
        Constant Field Values
      • DEFAULT_COMPRESSION_DICTIONARY_SAMPLE_SIZE

        public static final int DEFAULT_COMPRESSION_DICTIONARY_SAMPLE_SIZE
        See Also:
        Constant Field Values
      • COMPRESSION_DICTIONARY_SIZE_LIMIT

        public static final java.lang.String COMPRESSION_DICTIONARY_SIZE_LIMIT
        Maximum final dictionary size TODO add more details about the current limits
        See Also:
        Constant Field Values
      • DATA_WRITER_COMPUTE_JOB_CLASS

        public static final java.lang.String DATA_WRITER_COMPUTE_JOB_CLASS
        Config to set the class for the DataWriter job. When using KIF, we currently will continue to fall back to MR mode. The class must extend DataWriterComputeJob and have a zero-arg constructor.
        See Also:
        Constant Field Values
      • PUSH_TO_SEPARATE_REALTIME_TOPIC

        public static final java.lang.String PUSH_TO_SEPARATE_REALTIME_TOPIC
        See Also:
        Constant Field Values