Class AbstractDataWriterSparkJob

java.lang.Object
com.linkedin.venice.jobs.DataWriterComputeJob
com.linkedin.venice.spark.datawriter.jobs.AbstractDataWriterSparkJob
All Implemented Interfaces:
ComputeJob, Closeable, AutoCloseable
Direct Known Subclasses:
DataWriterSparkJob

public abstract class AbstractDataWriterSparkJob extends DataWriterComputeJob
The implementation of DataWriterComputeJob for Spark engine.
  • Constructor Details

    • AbstractDataWriterSparkJob

      public AbstractDataWriterSparkJob()
  • Method Details

    • configure

      public void configure(VeniceProperties props, PushJobSetting pushJobSetting)
      Specified by:
      configure in class DataWriterComputeJob
    • getSparkSession

      protected org.apache.spark.sql.SparkSession getSparkSession()
    • getUserInputDataFrame

      protected abstract org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getUserInputDataFrame()
      Get the data frame based on the user's input data. The schema of the Row has the following constraints:
      • Must contain a field "key" with the schema: DataTypes.BinaryType. This is the key of the record represented in serialized Avro.
      • Must contain a field "value" with the schema: DataTypes.BinaryType. This is the value of the record represented in serialized Avro.
      • Must not contain fields with names beginning with "_". These are reserved for internal use.
      • Can contain fields that do not violate the above constraints
      Returns:
      The data frame based on the user's input data
    • setInputConf

      protected void setInputConf(org.apache.spark.sql.SparkSession session, org.apache.spark.sql.DataFrameReader dataFrameReader, String key, String value)
    • getTaskTracker

      public DataWriterTaskTracker getTaskTracker()
      Specified by:
      getTaskTracker in class DataWriterComputeJob
    • getJobProperties

      protected VeniceProperties getJobProperties()
    • getPushJobSetting

      public PushJobSetting getPushJobSetting()
      Specified by:
      getPushJobSetting in class DataWriterComputeJob
    • runComputeJob

      protected void runComputeJob()
      Specified by:
      runComputeJob in class DataWriterComputeJob
    • kill

      public void kill()
      Specified by:
      kill in interface ComputeJob
      Overrides:
      kill in class DataWriterComputeJob
    • close

      public void close() throws IOException
      Throws:
      IOException