Class SparkPartitionWriter

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public class SparkPartitionWriter
    extends AbstractPartitionWriter
    • Constructor Detail

      • SparkPartitionWriter

        public SparkPartitionWriter​(java.util.Properties jobProperties,
                                    DataWriterAccumulators accumulators)
    • Method Detail

      • getTotalIncomingDataSizeInBytes

        protected long getTotalIncomingDataSizeInBytes()
        Description copied from class: AbstractPartitionWriter
        Return the size of serialized key and serialized value in bytes across the entire dataset. This is an optimization to skip writing the data to Kafka and reduce the load on Kafka and Venice storage nodes. Not all engines can support fetching this information during the execution of the job (eg Spark), but we can live with it for now. The quota is checked again in the Driver after the completion of the DataWriter job, and it will kill the VenicePushJob soon after.
        Overrides:
        getTotalIncomingDataSizeInBytes in class AbstractPartitionWriter
        Returns:
        the size of serialized key and serialized value in bytes across the entire dataset