Interface ExternalStorageWriter

All Superinterfaces:
AutoCloseable, Closeable

public interface ExternalStorageWriter extends Closeable
VPJ-side dual-write sink. The Venice Push Job invokes implementations of this interface alongside Kafka produce so that each record lands both in Venice and in a configured external storage system.

Lifecycle (one instance per Spark/MR partition writer task per target region):

  1. configure(VeniceProperties, String, int, String) once, right after no-arg construction.
  2. batchPut(List) per buffered batch of records. The wrapper guarantees records inside a single batch are in the same order as they were submitted; ordering across batches is preserved as well.
  3. flush() once, before Closeable.close(), to force durability of any buffered records.
  4. Closeable.close() once.

Implementations are loaded reflectively by class name from the VPJ config key push.job.external.storage.writer.class. Must have a public no-arg constructor. Must be idempotent on key — Spark task retries can replay a partition.

Per-region fan-out: when the push targets multiple DUAL_WRITE regions, the wrapper loads one writer instance per region and passes that region's name to configure(com.linkedin.venice.utils.VeniceProperties, java.lang.String, int). A region-aware impl routes its writes to that region's external-storage endpoint. The region list comes from push.job.dual.write.target.regions; per-region endpoints are impl-specific config under the push.job.external.storage.* prefix.

batchPut is synchronous from the caller's point of view: it must throw on durable-write failure so the Spark task fails and is retried. Implementations may buffer internally as long as flush() drains.

Deletes are not part of this SPI. Dual-write to external storage targets batch-only stores from clean batch-push input, where every record carries a non-null value and no delete is ever produced (Venice's batch-push path only emits deletes for KIF repush with TTL tombstones, which is out of scope for dual-write). The DualWriteVeniceWriter that wraps this SPI throws UnsupportedOperationException from its delete overrides so a stray delete fails the Spark task loudly rather than silently leaving the external sink in a divergent state.

  • Method Details

    • configure

      @Deprecated default void configure(VeniceProperties jobProps, String topicName, int partitionId)
      Deprecated.
      since the introduction of per-region dual-write fan-out; implement configure(VeniceProperties, String, int, String) instead so the impl can route to the correct regional endpoint. Retained as a default purely for source compatibility with existing single-endpoint implementations — the framework no longer calls this overload directly. The default throws so an impl that overrides neither overload fails loudly rather than silently writing nothing.
      Called once on the executor after no-arg construction. Implementations should read whatever job-level configuration they need from jobProps and prepare any per-task resources (connection pools, output paths, etc.).
    • configure

      default void configure(VeniceProperties jobProps, String topicName, int partitionId, String region)
      Region-aware configure. Called once on the executor after no-arg construction with the name of the region whose external-storage endpoint this instance should write to. Implementations should read whatever job-level configuration they need from jobProps (per-region endpoints live under the push.job.external.storage.* prefix) and prepare per-task resources.

      The default delegates to the deprecated configure(VeniceProperties, String, int) overload (dropping region) so existing single-endpoint impls keep working unchanged.

    • batchPut

      void batchPut(List<ExternalStorageRecord> records)
      Durable bulk-write of records. The list may contain one or more records; an empty list is a no-op. Implementations should treat this as a synchronous write — return only after every record is durable, or throw.
    • flush

      void flush()
      Force durability of any buffered records. Called by the partition writer before Closeable.close().