Class DataWriterSparkJob
java.lang.Object
com.linkedin.venice.jobs.DataWriterComputeJob
com.linkedin.venice.spark.datawriter.jobs.AbstractDataWriterSparkJob
com.linkedin.venice.spark.datawriter.jobs.DataWriterSparkJob
- All Implemented Interfaces:
ComputeJob,Closeable,AutoCloseable
The default implementation of
AbstractDataWriterSparkJob for Avro and Vson file input formats.-
Nested Class Summary
Nested classes/interfaces inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
DataWriterComputeJob.ConfigSetterNested classes/interfaces inherited from interface com.linkedin.venice.jobs.ComputeJob
ComputeJob.Status -
Field Summary
Fields inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
PASS_THROUGH_CONFIG_PREFIXES -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>Get the data frame based on the user's input data.Methods inherited from class com.linkedin.venice.spark.datawriter.jobs.AbstractDataWriterSparkJob
close, configure, getJobProperties, getPushJobSetting, getSparkSession, getTaskTracker, kill, runComputeJob, setInputConf, validateRmdSchemaMethods inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
configure, getFailureReason, getStatus, populateWithPassThroughConfigs, populateWithPassThroughConfigs, runJob, validateJob
-
Constructor Details
-
DataWriterSparkJob
public DataWriterSparkJob()
-
-
Method Details
-
getUserInputDataFrame
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getUserInputDataFrame()Description copied from class:AbstractDataWriterSparkJobGet the data frame based on the user's input data. The schema of theRowhas the following constraints:- Must contain a field "key" with the schema:
DataTypes.BinaryType. This is the key of the record represented in serialized Avro. - Must contain a field "value" with the schema:
DataTypes.BinaryType. This is the value of the record represented in serialized Avro. - Must not contain fields with names beginning with "_". These are reserved for internal use.
- Can contain fields that do not violate the above constraints
- Specified by:
getUserInputDataFramein classAbstractDataWriterSparkJob- Returns:
- The data frame based on the user's input data
- Must contain a field "key" with the schema:
-