Class DiskHealthCheckService

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public class DiskHealthCheckService
    extends AbstractVeniceService
    DiskHealthCheckService will wake up every 10 seconds by default and run a health check in the disk by writing 64KB random data, read them back and verify the content; if there is any error within the process, an in-memory state variable "diskHealthy" will be updated to false; otherwise, "diskHealthy" will be kept as true. If there is a SSD failure, the disk operation could hang forever; in order to report such kind of disk failure, there is a timeout mechanism inside the health status polling API; a total timeout will be decided at the beginning: totalTimeout = Math.max(30 seconds, health check interval + disk operation timeout) we will keep track of the last update time for the in-memory health status variable, if the in-memory status haven't been updated for more than the totalTimeout, we believe the disk operation hang due to disk failure and start reporting unhealthy for this server.
    • Constructor Detail

      • DiskHealthCheckService

        public DiskHealthCheckService​(boolean serviceEnabled,
                                      long healthCheckIntervalMs,
                                      long diskOperationTimeout,
                                      java.lang.String databasePath,
                                      long diskFailServerShutdownTimeMs)