Interface ExternalStorageWriter
- All Superinterfaces:
AutoCloseable,Closeable
Lifecycle (one instance per Spark/MR partition writer task per target region):
configure(VeniceProperties, String, int, String)once, right after no-arg construction.batchPut(List)per buffered batch of records. The wrapper guarantees records inside a single batch are in the same order as they were submitted; ordering across batches is preserved as well.flush()once, beforeCloseable.close(), to force durability of any buffered records.Closeable.close()once.
Implementations are loaded reflectively by class name from the VPJ config key
push.job.external.storage.writer.class. Must have a public no-arg constructor. Must be idempotent
on key — Spark task retries can replay a partition.
Per-region fan-out: when the push targets multiple DUAL_WRITE regions, the wrapper loads
one writer instance per region and passes that region's name to configure(com.linkedin.venice.utils.VeniceProperties, java.lang.String, int). A region-aware impl
routes its writes to that region's external-storage endpoint. The region list comes from
push.job.dual.write.target.regions; per-region endpoints are impl-specific config under the
push.job.external.storage.* prefix.
batchPut is synchronous from the caller's point of view: it must throw on durable-write
failure so the Spark task fails and is retried. Implementations may buffer internally as long as
flush() drains.
Deletes are not part of this SPI. Dual-write to external storage targets batch-only stores from
clean batch-push input, where every record carries a non-null value and no delete is ever produced
(Venice's batch-push path only emits deletes for KIF repush with TTL tombstones, which is out of scope for
dual-write). The DualWriteVeniceWriter that wraps this SPI throws
UnsupportedOperationException from its delete overrides so a stray delete fails the Spark
task loudly rather than silently leaving the external sink in a divergent state.
-
Method Summary
Modifier and TypeMethodDescriptionvoidbatchPut(List<ExternalStorageRecord> records) Durable bulk-write ofrecords.default voidconfigure(VeniceProperties jobProps, String topicName, int partitionId) Deprecated.default voidconfigure(VeniceProperties jobProps, String topicName, int partitionId, String region) Region-aware configure.voidflush()Force durability of any buffered records.
-
Method Details
-
configure
Deprecated.since the introduction of per-region dual-write fan-out; implementconfigure(VeniceProperties, String, int, String)instead so the impl can route to the correct regional endpoint. Retained as adefaultpurely for source compatibility with existing single-endpoint implementations — the framework no longer calls this overload directly. The default throws so an impl that overrides neither overload fails loudly rather than silently writing nothing.Called once on the executor after no-arg construction. Implementations should read whatever job-level configuration they need fromjobPropsand prepare any per-task resources (connection pools, output paths, etc.). -
configure
Region-aware configure. Called once on the executor after no-arg construction with the name of the region whose external-storage endpoint this instance should write to. Implementations should read whatever job-level configuration they need fromjobProps(per-region endpoints live under thepush.job.external.storage.*prefix) and prepare per-task resources.The default delegates to the deprecated
configure(VeniceProperties, String, int)overload (droppingregion) so existing single-endpoint impls keep working unchanged. -
batchPut
Durable bulk-write ofrecords. The list may contain one or more records; an empty list is a no-op. Implementations should treat this as a synchronous write — return only after every record is durable, or throw. -
flush
void flush()Force durability of any buffered records. Called by the partition writer beforeCloseable.close().
-
configure(VeniceProperties, String, int, String)instead so the impl can route to the correct regional endpoint.