Class VTConsistencyCheckerJob
java.lang.Object
com.linkedin.venice.spark.consistency.VTConsistencyCheckerJob
Spark job that checks VT consistency between two DCs using the Lily Pad algorithm.
Parallelizes across VT partitions: each Spark task handles one partition, consuming directly from both DCs' PubSub brokers and running the Lily Pad algorithm to find keys where the two leaders disagreed despite having full information.
Output is written as Parquet. Each row is one detected inconsistency with full forensic context (key hash, value hashes from both DCs, position vectors, high watermarks).
Example invocation:
java -cp venice-push-job-all.jar \
com.linkedin.venice.spark.consistency.VTConsistencyCheckerJob \
/path/to/vt-consistency.properties
Required config keys:
- "dc0.broker.url" — PubSub broker address for DC-0
- "dc1.broker.url" — PubSub broker address for DC-1
- "version.topic" — version topic name to scan (e.g.
my-store_v3) - "output.path" — output path to write Parquet results
- "number.of.regions" — total number of regions in the AA topology
Spark configs can be passed via the venice.spark.session.conf. prefix:
venice.spark.cluster=yarn venice.spark.session.conf.spark.executor.memory=20g venice.spark.session.conf.spark.executor.instances=20
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic voidstatic voidrun(Properties jobProps) Entry point for both CLI and tests.
-
Field Details
-
DC0_BROKER_URL
- See Also:
-
DC1_BROKER_URL
- See Also:
-
VERSION_TOPIC
- See Also:
-
OUTPUT_PATH
- See Also:
-
NUMBER_OF_REGIONS
- See Also:
-
-
Constructor Details
-
VTConsistencyCheckerJob
public VTConsistencyCheckerJob()
-
-
Method Details
-
main
-
run
Entry point for both CLI and tests. Accepts aPropertiesobject so tests can configure the job programmatically without going through arg parsing.
-