Class VTConsistencyCheckerJob

java.lang.Object
com.linkedin.venice.spark.consistency.VTConsistencyCheckerJob

public class VTConsistencyCheckerJob extends Object
Spark job that checks VT consistency between two DCs using the Lily Pad algorithm.

Parallelizes across VT partitions: each Spark task handles one partition, consuming directly from both DCs' PubSub brokers and running the Lily Pad algorithm to find keys where the two leaders disagreed despite having full information.

Output is written as Parquet. Each row is one detected inconsistency with full forensic context (key hash, value hashes from both DCs, position vectors, high watermarks).

Example invocation:

   java -cp venice-push-job-all.jar \
     com.linkedin.venice.spark.consistency.VTConsistencyCheckerJob \
     /path/to/vt-consistency.properties
 

Required config keys:

Spark configs can be passed via the venice.spark.session.conf. prefix:

   venice.spark.cluster=yarn
   venice.spark.session.conf.spark.executor.memory=20g
   venice.spark.session.conf.spark.executor.instances=20
 
  • Field Details

  • Constructor Details

    • VTConsistencyCheckerJob

      public VTConsistencyCheckerJob()
  • Method Details

    • main

      public static void main(String[] args)
    • run

      public static void run(Properties jobProps)
      Entry point for both CLI and tests. Accepts a Properties object so tests can configure the job programmatically without going through arg parsing.