Class DataWriterSparkJob

All Implemented Interfaces:
ComputeJob, Closeable, AutoCloseable

public class DataWriterSparkJob extends AbstractDataWriterSparkJob
The default implementation of AbstractDataWriterSparkJob for Avro and Vson file input formats.
  • Constructor Details

    • DataWriterSparkJob

      public DataWriterSparkJob()
  • Method Details

    • getUserInputDataFrame

      protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getUserInputDataFrame()
      Description copied from class: AbstractDataWriterSparkJob
      Get the data frame based on the user's input data. The schema of the Row has the following constraints:
      • Must contain a field "key" with the schema: DataTypes.BinaryType. This is the key of the record represented in serialized Avro.
      • Must contain a field "value" with the schema: DataTypes.BinaryType. This is the value of the record represented in serialized Avro.
      • Must not contain fields with names beginning with "_". These are reserved for internal use.
      • Can contain fields that do not violate the above constraints
      Specified by:
      getUserInputDataFrame in class AbstractDataWriterSparkJob
      Returns:
      The data frame based on the user's input data