Class VeniceOpenTelemetryMetricsRepository

java.lang.Object
com.linkedin.venice.stats.VeniceOpenTelemetryMetricsRepository

public class VeniceOpenTelemetryMetricsRepository extends Object
  • Field Details

    • REDUNDANT_LOG_FILTER

      public static final io.tehuti.utils.RedundantLogFilter REDUNDANT_LOG_FILTER
    • DEFAULT_METRIC_PREFIX

      public static final String DEFAULT_METRIC_PREFIX
      See Also:
  • Constructor Details

    • VeniceOpenTelemetryMetricsRepository

      public VeniceOpenTelemetryMetricsRepository(VeniceMetricsConfig metricsConfig)
  • Method Details

    • cloneWithNewMetricPrefix

      public VeniceOpenTelemetryMetricsRepository cloneWithNewMetricPrefix(String newMetricPrefix)
      Creates a new repository that shares the same OpenTelemetry SDK instance but uses a different metric prefix. This is useful for emitting metrics with a different prefix (e.g., "participant_store_client") without reinitializing OpenTelemetry.
      Parameters:
      newMetricPrefix - The metric prefix to use for the child repository
      Returns:
      A new VeniceOpenTelemetryMetricsRepository instance with the specified prefix
    • createDoubleHistogram

      public io.opentelemetry.api.metrics.DoubleHistogram createDoubleHistogram(MetricEntity metricEntity)
    • createLongCounter

      public io.opentelemetry.api.metrics.LongCounter createLongCounter(MetricEntity metricEntity)
    • createLongUpDownCounter

      public io.opentelemetry.api.metrics.LongUpDownCounter createLongUpDownCounter(MetricEntity metricEntity)
    • createLongGuage

      public io.opentelemetry.api.metrics.LongGauge createLongGuage(MetricEntity metricEntity)
    • createInstrument

      public Object createInstrument(MetricEntity metricEntity)
    • registerObservableLongCounter

      public io.opentelemetry.api.metrics.ObservableLongCounter registerObservableLongCounter(MetricEntity metricEntity, @Nonnull Consumer<io.opentelemetry.api.metrics.ObservableLongMeasurement> reportCallback)
      Registers an Observable Long Counter that reads accumulated values from a callback. This method should be called after the MetricEntityState is fully constructed, as the callback needs access to the metricAttributesData map.

      For MetricType.ASYNC_COUNTER_FOR_HIGH_PERF_CASES metrics, the callback is invoked during OpenTelemetry's metric collection cycle. The callback should iterate over all accumulated values and report them via the provided ObservableLongMeasurement.

      Each call creates a new SDK instrument handle via buildWithCallback — there is no deduplication. The OTel SDK natively aggregates data points from multiple instruments sharing the same name during the export pipeline's collection cycle.

      Parameters:
      metricEntity - the metric entity definition
      reportCallback - callback that reports all accumulated values to the measurement
      Returns:
      the created ObservableLongCounter, or null if OTel metrics are disabled
    • registerObservableLongUpDownCounter

      public io.opentelemetry.api.metrics.ObservableLongUpDownCounter registerObservableLongUpDownCounter(MetricEntity metricEntity, @Nonnull Consumer<io.opentelemetry.api.metrics.ObservableLongMeasurement> reportCallback)
      Registers an Observable Long UpDownCounter that reads accumulated values from a callback. This method should be called after the MetricEntityState is fully constructed, as the callback needs access to the metricAttributesData map.

      For MetricType.ASYNC_UP_DOWN_COUNTER_FOR_HIGH_PERF_CASES metrics, the callback is invoked during OpenTelemetry's metric collection cycle. The callback should iterate over all accumulated values and report them via the provided ObservableLongMeasurement. Unlike ASYNC_COUNTER_FOR_HIGH_PERF_CASES, this supports both positive and negative values.

      Each call creates a new SDK instrument handle via buildWithCallback — there is no deduplication. The OTel SDK natively aggregates data points from multiple instruments sharing the same name during the export pipeline's collection cycle.

      Parameters:
      metricEntity - the metric entity definition
      reportCallback - callback that reports all accumulated values to the measurement
      Returns:
      the created ObservableLongUpDownCounter, or null if OTel metrics are disabled
    • registerObservableLongGauge

      public io.opentelemetry.api.metrics.ObservableLongGauge registerObservableLongGauge(MetricEntity metricEntity, @Nonnull Consumer<io.opentelemetry.api.metrics.ObservableLongMeasurement> reportCallback)
      Registers an ObservableLongGauge backed by a single multi-emit callback. Use this for MetricType.ASYNC_GAUGE metrics with dynamic dimensions (e.g., per-enum, per-entity) so the caller can iterate and emit only the attribute combinations that currently have data — avoiding the cardinality blowout of registering one instrument per combo.

      The callback is invoked by the OTel SDK on every collection cycle on the SDK's collection thread. It may call measurement.record(value, attrs) zero or more times to emit data points. Combos not emitted during a given collection are not present in that cycle's output. Backing state read inside the callback must be safely published (volatile, concurrent collections, or immutable).

      Each call creates a new SDK instrument handle via buildWithCallback — there is no deduplication. Multiple callers (e.g., different stores) can register callbacks for the same metric name; the OTel SDK natively aggregates all their data points during collection.

      Callers should ensure their callback does not throw — uncaught exceptions are caught by the OTel SDK and logged, but the semantics of partial emissions within a single callback depend on where the throw happens. Implementations in this repo wrap per-combo bodies in try/catch to isolate failures across combos.

    • registerObservableDoubleGauge

      public io.opentelemetry.api.metrics.ObservableDoubleGauge registerObservableDoubleGauge(MetricEntity metricEntity, @Nonnull Consumer<io.opentelemetry.api.metrics.ObservableDoubleMeasurement> reportCallback)
      Registers an ObservableDoubleGauge backed by a single multi-emit callback. Same contract as registerObservableLongGauge(com.linkedin.venice.stats.metrics.MetricEntity, java.util.function.Consumer<io.opentelemetry.api.metrics.ObservableLongMeasurement>) but for MetricType.ASYNC_DOUBLE_GAUGE. See that method's Javadoc for callback threading, aggregation behaviour, and exception-safety expectations.
    • closeObservableInstrument

      public void closeObservableInstrument(MetricEntity metricEntity, Object instrument)
      Unregisters an observable instrument previously returned by one of the registerObservable* methods, so the OTel SDK stops invoking its callback. The SDK retains every callback until the returned handle is closed, so callers that re-register an observable must close the previous handle to avoid leaking callbacks and emitting duplicate data points under stale attributes. No-op if the handle is null (OTel disabled).
    • getDimensionName

      public String getDimensionName(VeniceMetricsDimensions dimension)
    • createAttributes

      public io.opentelemetry.api.common.Attributes createAttributes(MetricEntity metricEntity, Map<VeniceMetricsDimensions,String> baseDimensionsMap, VeniceDimensionInterface... additionalDimensionEnums)
    • createAttributes

      public io.opentelemetry.api.common.Attributes createAttributes(MetricEntity metricEntity, Map<VeniceMetricsDimensions,String> baseDimensionsMap, Map<VeniceMetricsDimensions,String> additionalDimensionsMap)
    • close

      public void close()
    • recordFailureMetric

      public void recordFailureMetric(MetricEntity metricEntity, Exception e)
      Records a metric-recording failure. Best-effort: any Exception from the failure counter or logger itself is caught and logged at error level, so callers iterating multiple combos in a single collection cycle do not need their own try/catch around this call. Error (e.g. OutOfMemoryError) is intentionally not caught — it should propagate so JVM-level failures still surface.
    • recordFailureMetric

      public void recordFailureMetric(MetricEntity metricEntity, String error)
      See recordFailureMetric(MetricEntity, Exception) — same best-effort semantics.
    • emitOpenTelemetryMetrics

      public boolean emitOpenTelemetryMetrics()
    • emitTehutiMetrics

      public boolean emitTehutiMetrics()
    • getMetricsConfig

      public VeniceMetricsConfig getMetricsConfig()
    • getMetricFormat

      public VeniceOpenTelemetryMetricNamingFormat getMetricFormat()
    • getRecordFailureMetric

      public MetricEntityStateGeneric getRecordFailureMetric()