Class StageMetricsRegistry

java.lang.Object
com.linkedin.venice.spark.datawriter.task.StageMetricsRegistry

public class StageMetricsRegistry extends Object
Central registry for per-stage diagnostic metrics in the VPJ Spark pipeline. Maintains insertion order so the report reflects the pipeline execution sequence.

Usage:

   StageMetricsRegistry registry = new StageMetricsRegistry(sparkContext);
   StageMetrics ttlMetrics = registry.register("ttl_filter");
   // ... instrument stage with ttlMetrics.recordsIn, ttlMetrics.recordsOut, etc.
   // After dataFrame.count():
   StageMetricsSnapshot snapshot = registry.snapshot();
   LOGGER.info("VPJ Pipeline Diagnostics:\n{}", snapshot.getFormattedReport());
 
  • Constructor Details

    • StageMetricsRegistry

      public StageMetricsRegistry(org.apache.spark.SparkContext sparkContext)
  • Method Details

    • register

      public StageMetrics register(String stageName)
      Register a new pipeline stage for tracking. Stages are reported in registration order.
      Parameters:
      stageName - unique identifier for the stage (e.g., "ttl_filter", "compaction")
      Returns:
      the StageMetrics instance for this stage (returns existing instance if already registered)
    • getStage

      public StageMetrics getStage(String stageName)
      Get a previously registered stage's metrics.
      Returns:
      the StageMetrics, or null if not registered
    • snapshot

      public StageMetricsSnapshot snapshot()
      Capture an immutable snapshot of all stage metrics. Call this after dataFrame.count() triggers execution so accumulator values are finalized.
      Returns:
      an engine-agnostic snapshot with per-stage summaries and a formatted report