Class DiskHealthCheckService

java.lang.Object
com.linkedin.venice.service.AbstractVeniceService
com.linkedin.davinci.storage.DiskHealthCheckService
All Implemented Interfaces:
Closeable, AutoCloseable

public class DiskHealthCheckService extends AbstractVeniceService
DiskHealthCheckService will wake up every 10 seconds by default and run a health check in the disk by writing 64KB random data, read them back and verify the content; if there is any error within the process, an in-memory state variable "diskHealthy" will be updated to false; otherwise, "diskHealthy" will be kept as true. If there is a SSD failure, the disk operation could hang forever; in order to report such kind of disk failure, there is a timeout mechanism inside the health status polling API; a total timeout will be decided at the beginning: totalTimeout = Math.max(30 seconds, health check interval + disk operation timeout) we will keep track of the last update time for the in-memory health status variable, if the in-memory status haven't been updated for more than the totalTimeout, we believe the disk operation hang due to disk failure and start reporting unhealthy for this server.
  • Constructor Details

    • DiskHealthCheckService

      public DiskHealthCheckService(boolean serviceEnabled, long healthCheckIntervalMs, long diskOperationTimeout, String databasePath, long diskFailServerShutdownTimeMs)
  • Method Details