com.linkedin.venice.spark.datawriter.task.StageMetrics

All Implemented Interfaces:: Serializable

public class StageMetrics extends Object implements Serializable

Holds per-stage diagnostic accumulators for tracking record counts, byte sizes, and timing through the VPJ Spark pipeline. One instance per pipeline stage (e.g., ttl_filter, compaction, compression_reencode, kafka_write).

Accumulators are registered with the SparkContext and aggregated on the driver after dataFrame.count() triggers execution.

Accuracy under speculative execution

Spark does not guarantee exactly-once accumulator semantics when speculative execution is enabled (spark.speculation=true). Under speculation, two task attempts for the same partition may both complete successfully. Spark discards the output of the slower attempt but does not roll back its accumulator deltas, so both attempts' contributions are applied to the driver-side value. This means all counters (recordsIn, recordsOut, bytesIn, bytesOut, timeNs) may be over-counted for any partition that was speculated. The degree of inflation is bounded by the number of speculated partitions.

These accumulators are intended for diagnostic purposes only (pipeline stage I/O ratios, throughput estimates, timing). They should not be used for correctness checks.

See Also:

Serialized Form

Field Summary

Fields

Modifier and Type

Field

Description

final org.apache.spark.util.LongAccumulator

bytesIn

final org.apache.spark.util.LongAccumulator

bytesOut

final org.apache.spark.util.LongAccumulator

recordsIn

final org.apache.spark.util.LongAccumulator

recordsOut

final org.apache.spark.util.LongAccumulator

timeNs
Constructor Summary

Constructors

Constructor

Description

StageMetrics(org.apache.spark.SparkContext sparkContext, String stageName)
Method Summary

Modifier and Type

Method

Description

String

getStageName()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- recordsIn
  
  public final org.apache.spark.util.LongAccumulator recordsIn
- recordsOut
  
  public final org.apache.spark.util.LongAccumulator recordsOut
- bytesIn
  
  public final org.apache.spark.util.LongAccumulator bytesIn
- bytesOut
  
  public final org.apache.spark.util.LongAccumulator bytesOut
- timeNs
  
  public final org.apache.spark.util.LongAccumulator timeNs
Constructor Details
- StageMetrics
  
  public StageMetrics(org.apache.spark.SparkContext sparkContext, String stageName)
Method Details
- getStageName
  
  public String getStageName()

Class StageMetrics

Accuracy under speculative execution

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

recordsIn

recordsOut

bytesIn

bytesOut

timeNs

Constructor Details

StageMetrics

Method Details

getStageName