Class HeartbeatMonitoringService
java.lang.Object
com.linkedin.venice.service.AbstractVeniceService
com.linkedin.davinci.stats.ingestion.heartbeat.HeartbeatMonitoringService
- All Implemented Interfaces:
Closeable,AutoCloseable
This service monitors heartbeats. Heartbeats are only monitored if lagMonitors are added for leader or follower
partitions. Once a lagMonitor is added, the service will being emitting a metric which grows linearly with time,
only resetting to the timestamp of the last reported heartbeat for a given partition.
Heartbeats are only monitored for stores which have a hybrid config. All other registrations for lag monitoring
are ignored.
Max and Average are reported per version of resource across partitions.
If a heartbeat is invoked for a partition that we're NOT monitoring lag for, it is ignored.
This class will monitor lag for a partition as a leader or follower, but never both. Whether we're reporting
leader or follower depends on which monitor was set last.
Lag will stop being reported for partitions which have the monitor removed.
Each region gets a different lag monitor
-
Nested Class Summary
Nested classes/interfaces inherited from class com.linkedin.venice.service.AbstractVeniceService
AbstractVeniceService.ServiceState -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intstatic final intstatic final intstatic final longstatic final longstatic final longFields inherited from class com.linkedin.venice.service.AbstractVeniceService
logger, serviceState -
Constructor Summary
ConstructorsConstructorDescriptionHeartbeatMonitoringService(io.tehuti.metrics.MetricsRepository metricsRepository, ReadOnlyStoreRepository metadataRepository, VeniceServerConfig serverConfig, HeartbeatMonitoringServiceStats heartbeatMonitoringServiceStats, CompletableFuture<HelixCustomizedViewOfflinePushRepository> customizedViewRepositoryFuture) -
Method Summary
Modifier and TypeMethodDescriptionvoidaddFollowerLagMonitor(Version version, int partition) Adds monitoring for a follower partition of a given version.voidaddLeaderLagMonitor(Version version, int partition) Adds monitoring for a leader partition of a given version.protected voidprotected Map<HeartbeatKey,IngestionTimestampEntry> getHeartbeatInfo(String versionTopicName, int partitionFilter, boolean filterLagReplica) protected Map<HeartbeatKey,IngestionTimestampEntry> longgetReplicaFollowerHeartbeatLag(PartitionConsumptionState partitionConsumptionState, boolean shouldLogLag) Get heartbeat lag from local region for a given FOLLOWER replica.longgetReplicaFollowerHeartbeatTimestamp(PartitionConsumptionState partitionConsumptionState, boolean shouldLogLag) Get heartbeat timestamp from local region for a given FOLLOWER replica.longgetReplicaLeaderMaxHeartbeatLag(PartitionConsumptionState partitionConsumptionState, boolean shouldLogLag) Get maximum heartbeat lag from all regions (except separate RT regions) for a given LEADER replica.longgetReplicaLeaderMinHeartbeatTimestamp(PartitionConsumptionState partitionConsumptionState, boolean shouldLogLag) Get minimum heartbeat timestamp from all regions (except separate RT regions) for a given LEADER replica.protected voidrecord()voidrecordFollowerHeartbeat(HeartbeatKey key, Long timestamp, boolean isReadyToServe) Record a follower heartbeat timestamp using a pre-built HeartbeatKey.voidrecordFollowerRecordTimestamp(HeartbeatKey key, Long timestamp, boolean isReadyToServe) Record a follower record-level timestamp using a pre-built HeartbeatKey.protected voidrecordLags(Map<HeartbeatKey, IngestionTimestampEntry> heartbeatTimestamps, com.linkedin.davinci.stats.ingestion.heartbeat.HeartbeatMonitoringService.ReportLagFunction lagFunction) voidrecordLeaderHeartbeat(HeartbeatKey key, Long timestamp, boolean isReadyToServe) Record a leader heartbeat timestamp using a pre-built HeartbeatKey.voidrecordLeaderRecordTimestamp(HeartbeatKey key, Long timestamp, boolean isReadyToServe) Record a leader record-level timestamp using a pre-built HeartbeatKey.protected voidrecordRecordLags(Map<HeartbeatKey, IngestionTimestampEntry> heartbeatTimestamps, com.linkedin.davinci.stats.ingestion.heartbeat.HeartbeatMonitoringService.ReportLagFunction lagFunction) voidremoveLagMonitor(Version version, int partition) Removes monitoring for a partition of a given version.voidsetKafkaStoreIngestionService(KafkaStoreIngestionService kafkaStoreIngestionService) booleanvoidvoidupdateLagMonitor(String resourceName, int partitionId, HeartbeatLagMonitorAction heartbeatLagMonitorAction) Update lag monitor for a given resource replica based on different heartbeat lag monitor action.
-
Field Details
-
DEFAULT_REPORTER_THREAD_SLEEP_INTERVAL_SECONDS
public static final int DEFAULT_REPORTER_THREAD_SLEEP_INTERVAL_SECONDS- See Also:
-
DEFAULT_LAG_LOGGING_THREAD_SLEEP_INTERVAL_SECONDS
public static final int DEFAULT_LAG_LOGGING_THREAD_SLEEP_INTERVAL_SECONDS- See Also:
-
DEFAULT_STALE_HEARTBEAT_LOG_THRESHOLD_MILLIS
public static final long DEFAULT_STALE_HEARTBEAT_LOG_THRESHOLD_MILLIS -
INVALID_MESSAGE_TIMESTAMP
public static final long INVALID_MESSAGE_TIMESTAMP- See Also:
-
INVALID_HEARTBEAT_LAG
public static final long INVALID_HEARTBEAT_LAG- See Also:
-
DEFAULT_LAG_MONITOR_CLEANUP_CYCLE
public static final int DEFAULT_LAG_MONITOR_CLEANUP_CYCLE- See Also:
-
-
Constructor Details
-
HeartbeatMonitoringService
public HeartbeatMonitoringService(io.tehuti.metrics.MetricsRepository metricsRepository, ReadOnlyStoreRepository metadataRepository, VeniceServerConfig serverConfig, HeartbeatMonitoringServiceStats heartbeatMonitoringServiceStats, CompletableFuture<HelixCustomizedViewOfflinePushRepository> customizedViewRepositoryFuture)
-
-
Method Details
-
addFollowerLagMonitor
Adds monitoring for a follower partition of a given version. This request is ignored if the version isn't hybrid.- Parameters:
version- the version to monitor lag forpartition- the partition to monitor lag for
-
addLeaderLagMonitor
Adds monitoring for a leader partition of a given version. This request is ignored if the version isn't hybrid.- Parameters:
version- the version to monitor lag forpartition- the partition to monitor lag for
-
removeLagMonitor
Removes monitoring for a partition of a given version.- Parameters:
version- the version to remove monitoring forpartition- the partition to remove monitoring for
-
getHeartbeatInfo
public Map<String,ReplicaHeartbeatInfo> getHeartbeatInfo(String versionTopicName, int partitionFilter, boolean filterLagReplica) -
startInner
- Specified by:
startInnerin classAbstractVeniceService- Returns:
- true if the service is completely started,
false if it is still starting asynchronously (in this case, it is the implementer's
responsibility to set
AbstractVeniceService.serviceStatetoAbstractVeniceService.ServiceState.STARTEDupon completion of the async work). - Throws:
Exception
-
stopInner
- Specified by:
stopInnerin classAbstractVeniceService- Throws:
Exception
-
updateLagMonitor
public void updateLagMonitor(String resourceName, int partitionId, HeartbeatLagMonitorAction heartbeatLagMonitorAction) Update lag monitor for a given resource replica based on different heartbeat lag monitor action. -
getReplicaLeaderMaxHeartbeatLag
public long getReplicaLeaderMaxHeartbeatLag(PartitionConsumptionState partitionConsumptionState, boolean shouldLogLag) Get maximum heartbeat lag from all regions (except separate RT regions) for a given LEADER replica.- Returns:
- Max leader heartbeat lag, or
INVALID_HEARTBEAT_LAGif any region's heartbeat is unknown.
-
getReplicaLeaderMinHeartbeatTimestamp
public long getReplicaLeaderMinHeartbeatTimestamp(PartitionConsumptionState partitionConsumptionState, boolean shouldLogLag) Get minimum heartbeat timestamp from all regions (except separate RT regions) for a given LEADER replica.- Returns:
- Min leader heartbeat timestamp, or
INVALID_MESSAGE_TIMESTAMPif any region's heartbeat is unknown.
-
getReplicaFollowerHeartbeatLag
public long getReplicaFollowerHeartbeatLag(PartitionConsumptionState partitionConsumptionState, boolean shouldLogLag) Get heartbeat lag from local region for a given FOLLOWER replica.- Returns:
- Follower heartbeat lag, or
INVALID_HEARTBEAT_LAGif local region's heartbeat is unknown.
-
getReplicaFollowerHeartbeatTimestamp
public long getReplicaFollowerHeartbeatTimestamp(PartitionConsumptionState partitionConsumptionState, boolean shouldLogLag) Get heartbeat timestamp from local region for a given FOLLOWER replica.- Returns:
- Follower heartbeat timestamp, or
INVALID_MESSAGE_TIMESTAMPif local region's heartbeat is unknown.
-
recordLeaderRecordTimestamp
Record a leader record-level timestamp using a pre-built HeartbeatKey. Avoids HeartbeatKey allocation and hash computation on the per-record hot path. -
recordFollowerRecordTimestamp
Record a follower record-level timestamp using a pre-built HeartbeatKey. Avoids HeartbeatKey allocation and hash computation on the per-record hot path. -
recordLeaderHeartbeat
Record a leader heartbeat timestamp using a pre-built HeartbeatKey. Avoids HeartbeatKey allocation and hash computation on the per-record hot path. -
recordFollowerHeartbeat
Record a follower heartbeat timestamp using a pre-built HeartbeatKey. Avoids HeartbeatKey allocation and hash computation on the per-record hot path. -
getLeaderHeartbeatTimeStamps
-
getFollowerHeartbeatTimeStamps
-
recordLags
protected void recordLags(Map<HeartbeatKey, IngestionTimestampEntry> heartbeatTimestamps, com.linkedin.davinci.stats.ingestion.heartbeat.HeartbeatMonitoringService.ReportLagFunction lagFunction) -
recordRecordLags
protected void recordRecordLags(Map<HeartbeatKey, IngestionTimestampEntry> heartbeatTimestamps, com.linkedin.davinci.stats.ingestion.heartbeat.HeartbeatMonitoringService.ReportLagFunction lagFunction) -
record
protected void record() -
checkAndMaybeLogHeartbeatDelay
protected void checkAndMaybeLogHeartbeatDelay() -
getMaxLeaderHeartbeatLag
-
getMaxFollowerHeartbeatLag
-
setKafkaStoreIngestionService
-