Class DualWriteVeniceWriter

java.lang.Object
com.linkedin.venice.writer.AbstractVeniceWriter<byte[],byte[],byte[]>
com.linkedin.venice.hadoop.task.datawriter.DualWriteVeniceWriter
All Implemented Interfaces:
Closeable, AutoCloseable

public class DualWriteVeniceWriter extends AbstractVeniceWriter<byte[],byte[],byte[]>
Wraps a Kafka-backed AbstractVeniceWriter and one or more ExternalStorageWriters (one per DUAL_WRITE target region) so each batch-push record is written to every external sink first and then produced to Kafka. The Kafka future is what the caller waits on for at-least-once semantics; external-sink durability is synchronous from the caller's point of view, so a failed external write throws before the Kafka produce starts and the Spark task is retried.

When the push targets multiple regions, the same record is fanned out to every regional writer before the Kafka produce. The guarantee is that no Kafka produce happens for a batch until every regional external write for that batch has succeeded: if any regional write fails (after its bounded retry) the exception propagates before the produce and fails the Spark task. Note this is not an all-or-nothing write across the external sinks within a failed attempt — an earlier region in the fan-out may already hold the batch while a later one failed. Consistency is restored by the whole-partition retry: external writes are idempotent on key, so the next attempt overwrites the partially-written region rather than leaving it divergent.

Region fan-out is sequential. Each region's batchPut (including its bounded retries and backoff) completes before the next region's begins, on the single partition-writer task thread. This keeps the failure semantics and ordering simple, but the per-batch external-write latency is the sum across regions. TODO(future iteration): parallelize the per-region fan-out (e.g. a small per-task executor that issues the regional batchPuts concurrently and then joins, aggregating failures) when cross-region write latency dominates the push.

The writer buffers up to batchSize consecutive put records and flushes them as a single ExternalStorageWriter.batchPut(List) (per regional writer) before the corresponding Kafka produces fire. batchSize = 1 (the default) disables buffering: every record is forwarded immediately as a one-element batch, matching pre-batching semantics. flush() and close() drain any pending buffer before doing their own work, so the producer's record ordering is preserved.

update and delete are not supported — batch pushes from clean input never call either. Both throw UnsupportedOperationException so a stray invocation fails the Spark task loudly rather than silently leaving the external sink and Venice in divergent states.