Class DataWriterSparkJob
java.lang.Object
com.linkedin.venice.jobs.DataWriterComputeJob
com.linkedin.venice.spark.datawriter.jobs.AbstractDataWriterSparkJob
com.linkedin.venice.spark.datawriter.jobs.DataWriterSparkJob
- All Implemented Interfaces:
ComputeJob
,Closeable
,AutoCloseable
The default implementation of
AbstractDataWriterSparkJob
for Avro and Vson file input formats.-
Nested Class Summary
Nested classes/interfaces inherited from interface com.linkedin.venice.jobs.ComputeJob
ComputeJob.Status
-
Field Summary
Fields inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
PASS_THROUGH_CONFIG_PREFIXES
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
Get the data frame based on the user's input data.Methods inherited from class com.linkedin.venice.spark.datawriter.jobs.AbstractDataWriterSparkJob
close, configure, getJobProperties, getPushJobSetting, getSparkSession, getTaskTracker, kill, runComputeJob, setInputConf
Methods inherited from class com.linkedin.venice.jobs.DataWriterComputeJob
configure, getFailureReason, getStatus, runJob, validateJob
-
Constructor Details
-
DataWriterSparkJob
public DataWriterSparkJob()
-
-
Method Details
-
getUserInputDataFrame
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getUserInputDataFrame()Description copied from class:AbstractDataWriterSparkJob
Get the data frame based on the user's input data. The schema of theRow
has the following constraints:- Must contain a field "key" with the schema:
DataTypes.BinaryType
. This is the key of the record represented in serialized Avro. - Must contain a field "value" with the schema:
DataTypes.BinaryType
. This is the value of the record represented in serialized Avro. - Must not contain fields with names beginning with "_". These are reserved for internal use.
- Can contain fields that do not violate the above constraints
- Specified by:
getUserInputDataFrame
in classAbstractDataWriterSparkJob
- Returns:
- The data frame based on the user's input data
- Must contain a field "key" with the schema:
-