Class AbstractDataWriterSparkJob
- java.lang.Object
-
- com.linkedin.venice.jobs.DataWriterComputeJob
-
- com.linkedin.venice.spark.datawriter.jobs.AbstractDataWriterSparkJob
-
- All Implemented Interfaces:
ComputeJob
,java.io.Closeable
,java.lang.AutoCloseable
- Direct Known Subclasses:
DataWriterSparkJob
public abstract class AbstractDataWriterSparkJob extends DataWriterComputeJob
The implementation ofDataWriterComputeJob
for Spark engine.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface com.linkedin.venice.jobs.ComputeJob
ComputeJob.Status
-
-
Field Summary
-
Fields inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
PASS_THROUGH_CONFIG_PREFIXES
-
-
Constructor Summary
Constructors Constructor Description AbstractDataWriterSparkJob()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
close()
void
configure(VeniceProperties props, PushJobSetting pushJobSetting)
protected VeniceProperties
getJobProperties()
PushJobSetting
getPushJobSetting()
protected org.apache.spark.sql.SparkSession
getSparkSession()
DataWriterTaskTracker
getTaskTracker()
protected abstract org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
getUserInputDataFrame()
Get the data frame based on the user's input data.void
kill()
protected void
runComputeJob()
protected void
setInputConf(org.apache.spark.sql.SparkSession session, org.apache.spark.sql.DataFrameReader dataFrameReader, java.lang.String key, java.lang.String value)
-
Methods inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
configure, getFailureReason, getStatus, runJob, validateJob
-
-
-
-
Method Detail
-
configure
public void configure(VeniceProperties props, PushJobSetting pushJobSetting)
- Specified by:
configure
in classDataWriterComputeJob
-
getSparkSession
protected org.apache.spark.sql.SparkSession getSparkSession()
-
getUserInputDataFrame
protected abstract org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getUserInputDataFrame()
Get the data frame based on the user's input data. The schema of theRow
has the following constraints:- Must contain a field "key" with the schema:
DataTypes.BinaryType
. This is the key of the record represented in serialized Avro. - Must contain a field "value" with the schema:
DataTypes.BinaryType
. This is the value of the record represented in serialized Avro. - Must not contain fields with names beginning with "_". These are reserved for internal use.
- Can contain fields that do not violate the above constraints
- Returns:
- The data frame based on the user's input data
- Must contain a field "key" with the schema:
-
setInputConf
protected void setInputConf(org.apache.spark.sql.SparkSession session, org.apache.spark.sql.DataFrameReader dataFrameReader, java.lang.String key, java.lang.String value)
-
getTaskTracker
public DataWriterTaskTracker getTaskTracker()
- Specified by:
getTaskTracker
in classDataWriterComputeJob
-
getJobProperties
protected VeniceProperties getJobProperties()
-
getPushJobSetting
public PushJobSetting getPushJobSetting()
- Specified by:
getPushJobSetting
in classDataWriterComputeJob
-
runComputeJob
protected void runComputeJob()
- Specified by:
runComputeJob
in classDataWriterComputeJob
-
kill
public void kill()
- Specified by:
kill
in interfaceComputeJob
- Overrides:
kill
in classDataWriterComputeJob
-
close
public void close() throws java.io.IOException
- Throws:
java.io.IOException
-
-