Class AbstractDataWriterSparkJob
java.lang.Object
com.linkedin.venice.jobs.DataWriterComputeJob
com.linkedin.venice.spark.datawriter.jobs.AbstractDataWriterSparkJob
- All Implemented Interfaces:
ComputeJob
,Closeable
,AutoCloseable
- Direct Known Subclasses:
DataWriterSparkJob
The implementation of
DataWriterComputeJob
for Spark engine.-
Nested Class Summary
Nested classes/interfaces inherited from interface com.linkedin.venice.jobs.ComputeJob
ComputeJob.Status
-
Field Summary
Fields inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
PASS_THROUGH_CONFIG_PREFIXES
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
void
configure
(VeniceProperties props, PushJobSetting pushJobSetting) protected VeniceProperties
protected org.apache.spark.sql.SparkSession
protected abstract org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
Get the data frame based on the user's input data.void
kill()
protected void
protected void
setInputConf
(org.apache.spark.sql.SparkSession session, org.apache.spark.sql.DataFrameReader dataFrameReader, String key, String value) Methods inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
configure, getFailureReason, getStatus, runJob, validateJob
-
Constructor Details
-
AbstractDataWriterSparkJob
public AbstractDataWriterSparkJob()
-
-
Method Details
-
configure
- Specified by:
configure
in classDataWriterComputeJob
-
getSparkSession
protected org.apache.spark.sql.SparkSession getSparkSession() -
getUserInputDataFrame
protected abstract org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getUserInputDataFrame()Get the data frame based on the user's input data. The schema of theRow
has the following constraints:- Must contain a field "key" with the schema:
DataTypes.BinaryType
. This is the key of the record represented in serialized Avro. - Must contain a field "value" with the schema:
DataTypes.BinaryType
. This is the value of the record represented in serialized Avro. - Must not contain fields with names beginning with "_". These are reserved for internal use.
- Can contain fields that do not violate the above constraints
- Returns:
- The data frame based on the user's input data
- Must contain a field "key" with the schema:
-
setInputConf
-
getTaskTracker
- Specified by:
getTaskTracker
in classDataWriterComputeJob
-
getJobProperties
-
getPushJobSetting
- Specified by:
getPushJobSetting
in classDataWriterComputeJob
-
runComputeJob
protected void runComputeJob()- Specified by:
runComputeJob
in classDataWriterComputeJob
-
kill
public void kill()- Specified by:
kill
in interfaceComputeJob
- Overrides:
kill
in classDataWriterComputeJob
-
close
- Throws:
IOException
-