Package com.linkedin.venice.hadoop
package com.linkedin.venice.hadoop
-
ClassDescriptionAbstractVeniceFilter<INPUT_VALUE>An abstraction to filter given data type.FilterChain<INPUT_VALUE>The FilterChain class takes a list of
AbstractVeniceFilter
to assemble a filter chain to manage the life cycles' of filters and perform filtering based on the order of filters.This interface lets users get input data informationA POJO that contains input data information (schema information and input data file size)This class is used to keep track of store storage quota and storage overhead ratio and check whether the total input data size exceeds the quotaThis class carries the state for the duration of the VenicePushJob.Interface of class that is used to keep track of push job details sent to the Venice controller.Mapper only MR to Validate Schema, Build compression dictionary if needed and persist some data (total file size and compression dictionary) in HDFS to be used by the VPJ Driver Note: processing all the files in this split are done sequentially and if it results in significant increase in the mapper time or resulting in timeouts, this needs to be revisited to be done via a thread pool.This class reads the data(total input size in bytes and zstd dictionary) persisted in HDFS byValidateSchemaAndBuildDictMapper
based on the schemaValidateSchemaAndBuildDictMapperOutput
This class provides a way to: 1.Custom Input Format with the following specs to be used for the featurePushJobSetting.useMapperToBuildDict
withValidateSchemaAndBuildDictMapper
1.This class sets up the Hadoop job used to push data to Venice.This class was originally from Voldemort.