Class AbstractDataWriterSparkJob
java.lang.Object
com.linkedin.venice.jobs.DataWriterComputeJob
com.linkedin.venice.spark.datawriter.jobs.AbstractDataWriterSparkJob
- All Implemented Interfaces:
ComputeJob,Closeable,AutoCloseable
- Direct Known Subclasses:
DataWriterSparkJob
The implementation of
DataWriterComputeJob for Spark engine.-
Nested Class Summary
Nested classes/interfaces inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
DataWriterComputeJob.ConfigSetterNested classes/interfaces inherited from interface com.linkedin.venice.jobs.ComputeJob
ComputeJob.Status -
Field Summary
Fields inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
PASS_THROUGH_CONFIG_PREFIXES -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()voidconfigure(VeniceProperties props, PushJobSetting pushJobSetting) protected VenicePropertiesprotected org.apache.spark.sql.SparkSessionprotected abstract org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>Get the data frame based on the user's input data.voidkill()voidprotected voidsetInputConf(org.apache.spark.sql.SparkSession session, org.apache.spark.sql.DataFrameReader dataFrameReader, String key, String value) protected voidvalidateRmdSchema(PushJobSetting pushJobSetting) Methods inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
configure, getFailureReason, getStatus, populateWithPassThroughConfigs, populateWithPassThroughConfigs, runJob, validateJob
-
Constructor Details
-
AbstractDataWriterSparkJob
public AbstractDataWriterSparkJob()
-
-
Method Details
-
configure
- Specified by:
configurein classDataWriterComputeJob
-
getSparkSession
protected org.apache.spark.sql.SparkSession getSparkSession() -
getUserInputDataFrame
protected abstract org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getUserInputDataFrame()Get the data frame based on the user's input data. The schema of theRowhas the following constraints:- Must contain a field "key" with the schema:
DataTypes.BinaryType. This is the key of the record represented in serialized Avro. - Must contain a field "value" with the schema:
DataTypes.BinaryType. This is the value of the record represented in serialized Avro. - Must not contain fields with names beginning with "_". These are reserved for internal use.
- Can contain fields that do not violate the above constraints
- Returns:
- The data frame based on the user's input data
- Must contain a field "key" with the schema:
-
validateRmdSchema
-
setInputConf
-
getTaskTracker
- Specified by:
getTaskTrackerin classDataWriterComputeJob
-
getJobProperties
-
getPushJobSetting
- Specified by:
getPushJobSettingin classDataWriterComputeJob
-
runComputeJob
public void runComputeJob()- Specified by:
runComputeJobin classDataWriterComputeJob
-
kill
public void kill()- Specified by:
killin interfaceComputeJob- Overrides:
killin classDataWriterComputeJob
-
close
- Throws:
IOException
-