Class HeartbeatBasedSystemStoreHealthChecker
java.lang.Object
com.linkedin.venice.controller.systemstore.HeartbeatBasedSystemStoreHealthChecker
- All Implemented Interfaces:
SystemStoreHealthChecker,AutoCloseable
public class HeartbeatBasedSystemStoreHealthChecker
extends Object
implements SystemStoreHealthChecker
Default
SystemStoreHealthChecker implementation that uses the heartbeat write+read cycle to determine
system store health.
For each store, it sends a heartbeat timestamp to all child regions, then polls periodically until the heartbeat
is read back or a timeout is reached. A store that reads back a fresh heartbeat is marked HEALTHY; a store that
still returns a stale or unreachable heartbeat once the timeout elapses is marked UNHEALTHY. Stores that are
never polled before the check aborts (e.g., leadership loss or shutdown) are omitted from the result and
deferred to the next round, per the SystemStoreHealthChecker contract.-
Nested Class Summary
Nested classes/interfaces inherited from interface com.linkedin.venice.controller.systemstore.SystemStoreHealthChecker
SystemStoreHealthChecker.HealthCheckResult -
Constructor Summary
ConstructorsConstructorDescriptionHeartbeatBasedSystemStoreHealthChecker(VeniceParentHelixAdmin parentAdmin, int heartbeatWaitTimeInSeconds, AtomicBoolean isRunning) -
Method Summary
Modifier and TypeMethodDescriptioncheckHealth(String clusterName, Set<String> systemStoreNames) Check the health of the given system stores in the specified cluster.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface com.linkedin.venice.controller.systemstore.SystemStoreHealthChecker
close
-
Constructor Details
-
HeartbeatBasedSystemStoreHealthChecker
public HeartbeatBasedSystemStoreHealthChecker(VeniceParentHelixAdmin parentAdmin, int heartbeatWaitTimeInSeconds, AtomicBoolean isRunning)
-
-
Method Details
-
checkHealth
public Map<String,SystemStoreHealthChecker.HealthCheckResult> checkHealth(String clusterName, Set<String> systemStoreNames) Description copied from interface:SystemStoreHealthCheckerCheck the health of the given system stores in the specified cluster.- Specified by:
checkHealthin interfaceSystemStoreHealthChecker- Parameters:
clusterName- the Venice cluster namesystemStoreNames- the set of system store names to check- Returns:
- a map from system store name to its health check result. Implementations should return an entry for
every store they were able to check. Missing entries (e.g., when the checker aborts early due to
leadership change or shutdown) are treated by the caller as "deferred to next round" — they are
neither marked HEALTHY nor UNHEALTHY for this round, so a partial result will not inflate unhealthy
counts. Implementations should therefore omit a store from the result map only when no decision was
reached for it; an explicit UNHEALTHY entry should be returned for stores that were checked and found
to be unhealthy.
This method is invoked on the repair service's single-threaded scheduler, so a call that blocks indefinitely will stall every subsequent repair round. Implementations must bound their own execution time and honor thread interruption (the service calls
shutdownNow()on shutdown) rather than relying on the caller to time them out.
-