Class SystemStoreRepairTask

  • All Implemented Interfaces:

    public class SystemStoreRepairTask
    extends java.lang.Object
    implements java.lang.Runnable
    This class tries to scan all cluster which current parent controller is the leader controller. It will perform the following action for each system store of each cluster: 1. Check system store is created / has current version. 2. Send heartbeat to system store and check if heartbeat is received. 3. If system store failed any of the check in (1) / (2), it will try to run empty push to repair the system store, until maximum retry of repair is reached. It will emit metrics to indicate bad system store counts per cluster and how many stores are not fixable by the task.
    • Field Detail

      • LOGGER

        public static final org.apache.logging.log4j.Logger LOGGER

        public static final java.lang.String SYSTEM_STORE_REPAIR_JOB_PREFIX
        See Also:
        Constant Field Values
    • Constructor Detail

      • SystemStoreRepairTask

        public SystemStoreRepairTask​(VeniceParentHelixAdmin parentAdmin,
                                     java.util.Map<java.lang.String,​SystemStoreHealthCheckStats> clusterToSystemStoreHealthCheckStatsMap,
                                     int maxRepairRetry,
                                     int heartbeatWaitTimeSeconds,
                                     java.util.concurrent.atomic.AtomicBoolean isRunning)
    • Method Detail

      • run

        public void run()
        Specified by:
        run in interface java.lang.Runnable
      • getControllerClientMap

        public java.util.Map<java.lang.String,​ControllerClient> getControllerClientMap​(java.lang.String clusterName)
      • getClusterSystemStoreHealthCheckStats

        public SystemStoreHealthCheckStats getClusterSystemStoreHealthCheckStats​(java.lang.String clusterName)