Class HeartbeatMonitoringService

java.lang.Object
com.linkedin.venice.service.AbstractVeniceService
com.linkedin.davinci.stats.ingestion.heartbeat.HeartbeatMonitoringService
All Implemented Interfaces:
Closeable, AutoCloseable

public class HeartbeatMonitoringService extends AbstractVeniceService
This service monitors heartbeats. Heartbeats are only monitored if lagMonitors are added for leader or follower partitions. Once a lagMonitor is added, the service will being emitting a metric which grows linearly with time, only resetting to the timestamp of the last reported heartbeat for a given partition. Heartbeats are only monitored for stores which have a hybrid config. All other registrations for lag monitoring are ignored. Max and Average are reported per version of resource across partitions. If a heartbeat is invoked for a partition that we're NOT monitoring lag for, it is ignored. This class will monitor lag for a partition as a leader or follower, but never both. Whether we're reporting leader or follower depends on which monitor was set last. Lag will stop being reported for partitions which have the monitor removed. Each region gets a different lag monitor
  • Field Details

    • DEFAULT_REPORTER_THREAD_SLEEP_INTERVAL_SECONDS

      public static final int DEFAULT_REPORTER_THREAD_SLEEP_INTERVAL_SECONDS
      See Also:
    • DEFAULT_LAG_LOGGING_THREAD_SLEEP_INTERVAL_SECONDS

      public static final int DEFAULT_LAG_LOGGING_THREAD_SLEEP_INTERVAL_SECONDS
      See Also:
    • DEFAULT_STALE_HEARTBEAT_LOG_THRESHOLD_MILLIS

      public static final long DEFAULT_STALE_HEARTBEAT_LOG_THRESHOLD_MILLIS
  • Constructor Details

    • HeartbeatMonitoringService

      public HeartbeatMonitoringService(io.tehuti.metrics.MetricsRepository metricsRepository, ReadOnlyStoreRepository metadataRepository, Set<String> regionNames, String localRegionName)
  • Method Details

    • addFollowerLagMonitor

      public void addFollowerLagMonitor(Version version, int partition)
      Adds monitoring for a follower partition of a given version. This request is ignored if the version isn't hybrid.
      Parameters:
      version - the version to monitor lag for
      partition - the partition to monitor lag for
    • addLeaderLagMonitor

      public void addLeaderLagMonitor(Version version, int partition)
      Adds monitoring for a leader partition of a given version. This request is ignored if the version isn't hybrid.
      Parameters:
      version - the version to monitor lag for
      partition - the partition to monitor lag for
    • removeLagMonitor

      public void removeLagMonitor(Version version, int partition)
      Removes monitoring for a partition of a given version.
      Parameters:
      version - the version to remove monitoring for
      partition - the partition to remove monitoring for
    • getHeartbeatInfo

      public Map<String,ReplicaHeartbeatInfo> getHeartbeatInfo(String versionTopicName, int partitionFilter, boolean filterLagReplica)
    • startInner

      public boolean startInner() throws Exception
      Specified by:
      startInner in class AbstractVeniceService
      Returns:
      true if the service is completely started, false if it is still starting asynchronously (in this case, it is the implementer's responsibility to set AbstractVeniceService.serviceState to AbstractVeniceService.ServiceState.STARTED upon completion of the async work).
      Throws:
      Exception
    • stopInner

      public void stopInner() throws Exception
      Specified by:
      stopInner in class AbstractVeniceService
      Throws:
      Exception
    • recordLeaderHeartbeat

      public void recordLeaderHeartbeat(String store, int version, int partition, String region, Long timestamp, boolean isReadyToServe)
      Record a leader heartbeat timestamp for a given partition of a store version from a specific region.
      Parameters:
      store - the store this heartbeat is for
      version - the version this heartbeat is for
      partition - the partition this heartbeat is for
      region - the region this heartbeat is from
      timestamp - the time of this heartbeat
      isReadyToServe - has this partition been marked ready to serve? This determines how the metric is reported
    • recordFollowerHeartbeat

      public void recordFollowerHeartbeat(String store, int version, int partition, String region, Long timestamp, boolean isReadyToServe)
      Record a follower heartbeat timestamp for a given partition of a store version from a specific region.
      Parameters:
      store - the store this heartbeat is for
      version - the version this heartbeat is for
      partition - the partition this heartbeat is for
      region - the region this heartbeat is from
      timestamp - the time of this heartbeat
      isReadyToServe - has this partition been marked ready to serve? This determines how the metric is reported
    • getLeaderHeartbeatTimeStamps

      protected Map<String,Map<Integer,Map<Integer,Map<String,HeartbeatTimeStampEntry>>>> getLeaderHeartbeatTimeStamps()
    • getFollowerHeartbeatTimeStamps

      protected Map<String,Map<Integer,Map<Integer,Map<String,HeartbeatTimeStampEntry>>>> getFollowerHeartbeatTimeStamps()
    • recordLags

      protected void recordLags(Map<String,Map<Integer,Map<Integer,Map<String,HeartbeatTimeStampEntry>>>> heartbeatTimestamps, com.linkedin.davinci.stats.ingestion.heartbeat.HeartbeatMonitoringService.ReportLagFunction lagFunction)
    • record

      protected void record()
    • checkAndMaybeLogHeartbeatDelayMap

      protected void checkAndMaybeLogHeartbeatDelayMap(Map<String,Map<Integer,Map<Integer,Map<String,HeartbeatTimeStampEntry>>>> heartbeatTimestamps)
    • checkAndMaybeLogHeartbeatDelay

      protected void checkAndMaybeLogHeartbeatDelay()
    • getMaxLeaderHeartbeatLag

      public AggregatedHeartbeatLagEntry getMaxLeaderHeartbeatLag()
    • getMaxFollowerHeartbeatLag

      public AggregatedHeartbeatLagEntry getMaxFollowerHeartbeatLag()