Class DataWriterSparkJob
- java.lang.Object
-
- com.linkedin.venice.jobs.DataWriterComputeJob
-
- com.linkedin.venice.spark.datawriter.jobs.AbstractDataWriterSparkJob
-
- com.linkedin.venice.spark.datawriter.jobs.DataWriterSparkJob
-
- All Implemented Interfaces:
ComputeJob
,java.io.Closeable
,java.lang.AutoCloseable
public class DataWriterSparkJob extends AbstractDataWriterSparkJob
The default implementation ofAbstractDataWriterSparkJob
for Avro and Vson file input formats.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface com.linkedin.venice.jobs.ComputeJob
ComputeJob.Status
-
-
Field Summary
-
Fields inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
PASS_THROUGH_CONFIG_PREFIXES
-
-
Constructor Summary
Constructors Constructor Description DataWriterSparkJob()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
getUserInputDataFrame()
Get the data frame based on the user's input data.-
Methods inherited from class com.linkedin.venice.spark.datawriter.jobs.AbstractDataWriterSparkJob
close, configure, getJobProperties, getPushJobSetting, getSparkSession, getTaskTracker, kill, runComputeJob, setInputConf
-
Methods inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
configure, getFailureReason, getStatus, runJob, validateJob
-
-
-
-
Method Detail
-
getUserInputDataFrame
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getUserInputDataFrame()
Description copied from class:AbstractDataWriterSparkJob
Get the data frame based on the user's input data. The schema of theRow
has the following constraints:- Must contain a field "key" with the schema:
DataTypes.BinaryType
. This is the key of the record represented in serialized Avro. - Must contain a field "value" with the schema:
DataTypes.BinaryType
. This is the value of the record represented in serialized Avro. - Must not contain fields with names beginning with "_". These are reserved for internal use.
- Can contain fields that do not violate the above constraints
- Specified by:
getUserInputDataFrame
in classAbstractDataWriterSparkJob
- Returns:
- The data frame based on the user's input data
- Must contain a field "key" with the schema:
-
-