Venice P2P Transfer Bootstrapping Architecture

High-Performance Peer-to-Peer Data Bootstrap

Accelerating Venice Node Deployment Through Direct Peer-to-Peer RocksDB Snapshot Transfer


🚨 Problem & Solution

The Problem

Bootstrap Bottlenecks

  • Kafka-Only Recovery: All nodes bootstrap exclusively from Kafka brokers
  • Resource Intensive: Time-consuming process during cluster recovery, inefficient for consuming messages from the PubSub system under high-update workloads
  • Scalability Limits: Broker capacity becomes recovery bottleneck

Real-World Impact

  • Extended MTTR (Mean Time to Restore or Repair) during outages
  • Cascading broker overload
  • Slower cluster expansion
  • Increased operational overhead

πŸ’‘ The Solution

Direct P2P Transfer

  • Peer-to-Peer: Direct node-to-node data transfer
  • RocksDB Snapshots: Consistent point-in-time data copies
  • Intelligent Fallback: Automatic Kafka Ingestion recovery on failure
  • Low Risk: Low Deployment Risk in DaVinci client

Key Benefits

  • Faster node recovery and scaling
  • Reduced Kafka broker load

πŸ—οΈ System Architecture Flow

Venice Blob Transfer Complete Flow

                           Venice Blob Transfer Complete Flow

Step 1: Peer Discovery
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  Discovery Request   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  Query Helix/ZK     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Client Node β”‚ ────────────────────>β”‚ Discovery   β”‚ ──────────────────> β”‚ Metadata    β”‚
β”‚ (Needs Data)β”‚                      β”‚ Service     β”‚                     β”‚ Repository  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       ^                                     β”‚                                   β”‚
       β”‚                                     β”‚ Return Host List                  β”‚
       β”‚                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”‚
       └──────────────────────────────│ Host List:  β”‚ <β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β”‚ [host1,     β”‚
                                      β”‚  host2,     β”‚
                                      β”‚  host3...]  β”‚
                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Step 2: Sequential Host Attempts
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  T1:Try Host 1  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” 
β”‚ Client Node β”‚ ──────────────> β”‚ Server A    β”‚ 
β”‚             β”‚ <── FAIL ────── β”‚ (No Data/   β”‚                 
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚  Busy)      β”‚                
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 
                 T2:Try Host 2  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  
                ──────────────> β”‚ Server B    β”‚ 
                <── FAIL ────── β”‚ Table Formatβ”‚
                                β”‚  not Match  β”‚              
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       

                 T3:Try Host 3  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  
                ──────────────> β”‚ Server C    β”‚ 
                                β”‚ (Not busy,  β”‚               
                                β”‚Format Match)β”‚              
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     
                                         
Step 3: Start Transfer                   
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       
β”‚ Client Node    β”‚ 
β”‚                β”‚
β”‚  Receives:     β”‚ <── 1.1 File: file1.sst ────────── β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. Files      β”‚ <── 1.2 File: file2.sst ────────── β”‚ Server C    β”‚
β”‚                β”‚ <── 1.3 File: MANIFEST_00 ──────── β”‚             β”‚
β”‚  2. Metadata   β”‚ <── 2. Metadata ────────────────── β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚  3. COMPLETE   β”‚ <── 3. COMPLETE Flag ───────────── 
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Step 4: Client Processing after recevie COMPLETE flag

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  Validate Files     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    Atomic Rename     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Client Node β”‚ ──────────────────> β”‚ Temp Folder β”‚   ─────────────────> β”‚ Partition   β”‚
│             │  (MD5)              │/store/temp_ │    (temp→partition)  │ Folder      │
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     | p1/         |                      β”‚ /store/p1/  β”‚
                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Step 5: Kafka Ingestion Fallback (Always Happens)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  Resume/Fill Gap    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Client Node β”‚ ──────────────────> β”‚ Kafka       β”‚
β”‚             β”‚  From snapshot      β”‚ Ingestion   β”‚
β”‚             β”‚  offset to latest   β”‚             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Step 1: Peer Discovery

  • Venice Server: Query Helix CustomizedViewRepository for COMPLETED nodes
  • DaVinci Application: Query DaVinci push job report for ready-to-serve nodes

Step 2: P2P Transfer in Client Side

Connectivity Check ────> Peer Selection ────> Sequential Request
      ↓                         ↓                      ↓
Parallel connection       Filter & shuffle      Try peers until
check with caching      connectable hosts      success or exhaust

Step 3: Data Transfer

Snapshot Creation ────> File Streaming ────>  Metadata Sync
       ↓                       ↓                  ↓
Server creates         Chunked transfer      Offset + Version State
RocksDB snapshot      with MD5 validation     after files complete

Step 4: Completion

  • Success Path: After validating all files, atomically rename the temp directory to the partition directory, then initiate Kafka ingestion to synchronize any remaining offset gap.
  • Fallback Path: If any error occurs, clean up the temp directory and retry with the next peer; if all peers fail, back to Kafka bootstrap from the beginning.

Key Components

  • DefaultIngestionBackend - Bootstrap orchestration entry point
  • NettyP2PBlobTransferManager - P2P transfer coordinator
  • BlobSnapshotManager - Server-side snapshot lifecycle
  • NettyFileTransferClient - High-performance client

πŸ“₯ Client-Side Process

Process Flow

1. Discover Peers β†’ 2. Check Connectivity β†’ 3. Request Data β†’ 4. Fallback to Kafka

πŸ” Step 1: Peer Discovery

Venice Server:

  • Query Helix CustomizedViewRepository
  • Find COMPLETED nodes
  • Filter by store/version/partition

DaVinci Application:

  • Query DaVinci push job report
  • Find ready-to-serve nodes
  • Extract available peer list

πŸ”— Step 2: Connectivity Checking

Smart Caching Strategy due to large peer sets:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Purge Stale β”‚ β†’ β”‚ Filter Bad  β”‚ β†’ β”‚ Check Hosts in      β”‚ β†’ β”‚ Update Cacheβ”‚
β”‚ Records     β”‚   β”‚ Hosts       β”‚   β”‚Parallel Connectivityβ”‚   β”‚ Results     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                          
  • Parallel Testing: Multiple host connections simultaneously
  • Smart Caching: Remember good/bad hosts with timestamps
  • Timeout Management: 1-minute connectivity check limit

πŸ“¦ Step 3: Sequential Data Request

For Each Peer (Shuffled List):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Send Request  β”‚ β†’ β”‚  Receive Data   β”‚ β†’ β”‚  Receive        β”‚ β†’ β”‚  Receive        β”‚ β†’ β”‚   Validate      β”‚ β†’ β”‚  Rename Temp    β”‚
β”‚    HTTP GET     β”‚   β”‚  File Chunks    β”‚   β”‚  Metadata       β”‚   β”‚ COMPLETE_FLAG   β”‚   β”‚      MD5        β”‚   β”‚ to Partition Dirβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🚨 Error Handling Details in different Scenarios

Connection Failures (VenicePeersConnectionException)

  • Network timeout
  • SSL handshake failure
  • Host unreachable
  • Action: Try next peer

File Not Found (VeniceBlobTransferFileNotFoundException)

  • Snapshot missing
  • Stale snapshot
  • Server too busy, reject with 429 errors
  • Action: Try next peer

Transfer Errors (Data Integrity Issues)

  • Checksum mismatch
  • File size mismatch
  • Network interruption: Server initiates deployment or shuts down unexpectedly
  • Action: Cleanup + next peer

🧹 Comprehensive Cleanup Process

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Force Flush β”‚ β†’ β”‚ Close File  β”‚ β†’ β”‚ Delete      β”‚ β†’ β”‚ Reset       β”‚
β”‚ File Channelβ”‚   β”‚ Handles     β”‚   β”‚ Partial Dir β”‚   β”‚ Handler     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • Zero Partial State: Complete cleanup ensures no corruption
  • Handler Reset: Ready for next peer attempt

πŸ”„ Step 4: Kafka Fallback Strategy

Two-Phase Strategy:

  • Phase 1 - Blob Transfer: Rapid bulk data transfer via P2P
  • Phase 2 - Kafka Fill-Gap: Even if blob transfer fails, deployment continues with Kafka ingestion

Zero-Risk Design: After a successful blob transfer, Kafka ingestion always follows to synchronize any data between the snapshot offset and the latest offset, guaranteeing full deployment.


πŸ“€ Server-Side Process

Process Flow

1. Request Reception & Validation β†’ 2. Prepare Snapshot & Metadata β†’ 3. File Transfer & Chunking β†’ 4. Metadata Transfer & Completion

β˜‘οΈ Step 1: Request Reception & Validation

Incoming Request Pipeline:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ HTTP GET    β”‚    β”‚ Parse URI   β”‚    β”‚ Validate    β”‚    β”‚ Check       β”‚
β”‚ Request     β”‚ β†’  β”‚ Extract     β”‚ β†’  β”‚ Table       β”‚ β†’  β”‚ Concurrency β”‚
β”‚             β”‚    β”‚ Parameters  β”‚    β”‚ Format      β”‚    β”‚ Limits      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • URI Parsing: /storeName/version/partition/tableFormat
  • Format Check: PLAIN_TABLE vs BLOCK_BASED_TABLE match verification
  • Concurrency Control: Global limit enforcement; reject with 429 if over limit

πŸ“Š Step 2: Prepare Snapshot & Metadata

Snapshot Lifecycle:

  • Check staleness (Configable TTL)
  • Verify concurrent user limits
  • Create a new snapshot if the existing one is stale or does not exist
  • Prepare partition metadata

Metadata Preparation:

  • Serialization of StoreVersionState (enables synchronization of hybrid store configuration parameters)
  • OffsetRecord encapsulation (captures the snapshot’s offset for accurate state synchronization)
  • JSON metadata response

πŸ“¦ Step 3: File Transfer & Chunking Strategy (Server/Client Single File Transfer Process Details)

Server-Side Adaptive Chunking Algorithm:

Step 1: File Preparation
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Open File for         β”‚   β”‚   Calculate File        β”‚   β”‚   Generate MD5          β”‚   β”‚   Prepare HTTP          β”‚   β”‚   Send Response         β”‚
β”‚     Reading             β”‚   β”‚      Length             β”‚   β”‚     Checksum            β”‚   β”‚    Response             β”‚   β”‚      Headers            β”‚
β”‚                         β”‚ β†’ β”‚                         β”‚ β†’ β”‚                         β”‚ β†’ β”‚                         β”‚ β†’ β”‚                         β”‚
β”‚ Open file in            β”‚   β”‚ Get file size for       β”‚   β”‚ Generate checksum       β”‚   β”‚ Set content headers     β”‚   β”‚ Send headers to         β”‚
β”‚ read-only mode          β”‚   β”‚ response headers        β”‚   β”‚ for validation          β”‚   β”‚ and file metadata       β”‚   β”‚ client first            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Step 2: Adaptive Chunking Strategy
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   
β”‚   Calculate Optimal     β”‚   β”‚   Create Chunked        β”‚   β”‚   Wrap for HTTP         β”‚   
β”‚     Chunk Size          β”‚   β”‚    File Handler         β”‚   β”‚     Streaming           β”‚  
β”‚ 16KB - 2MB              β”‚ β†’ β”‚                         β”‚ β†’ β”‚                         β”‚ 
β”‚ Determine best chunk    β”‚   β”‚ Set up memory-efficient β”‚   β”‚ Prepare for HTTP        β”‚ 
β”‚ size based on file      β”‚   β”‚ streaming mechanism     β”‚   β”‚ chunked transfer        β”‚ 
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   

Step 3: Efficient File Streaming
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Stream File Data      β”‚   β”‚   Netty Chunked         β”‚   β”‚   Monitor Transfer      β”‚   β”‚   Complete Transfer     β”‚
β”‚     in Chunks           β”‚   β”‚    Write Handler        β”‚   β”‚      Progress           β”‚   β”‚                         β”‚
β”‚                         β”‚ β†’ β”‚                         β”‚ β†’ β”‚                         β”‚ β†’ β”‚                         β”‚
β”‚ ctx.writeAndFlush()     β”‚   β”‚ Memory-efficient        β”‚   β”‚ Log success/failure     β”‚   β”‚ Ready for next file     β”‚
β”‚ non-blocking write      β”‚   β”‚ file streaming          β”‚   β”‚ per file                β”‚   β”‚ or metadata             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Client-Side High-Performance Reception:

Step 1: File Setup
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Parse Headers         β”‚   β”‚   Create Temp Dir       β”‚   β”‚   Create Empty File     β”‚
β”‚                         β”‚   β”‚                         β”‚   β”‚                         β”‚
β”‚ β€’ Extract filename      β”‚ β†’ β”‚ β€’ Make /store/temp_p0/  β”‚ β†’ β”‚ β€’ Create empty file     β”‚
β”‚ β€’ Extract file size     β”‚   β”‚ β€’ Delete if exists      β”‚   β”‚ β€’ Open FileChannel      β”‚
β”‚ β€’ Extract MD5 hash      β”‚   β”‚                         β”‚   β”‚   in WRITE mode         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Step 2: Write Data Chunk for each chunk arrived repeatedly
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Receive Network       β”‚   β”‚   Convert to Channel    β”‚   β”‚   Write to File         β”‚
β”‚      Data Chunk         β”‚   β”‚                         β”‚   β”‚                         β”‚
β”‚                         β”‚ β†’ β”‚ β€’ ByteBuf β†’ Stream      β”‚ β†’ β”‚ β€’ transferFrom() call   β”‚
β”‚ β€’ HttpContent arrives   β”‚   β”‚ β€’ Stream β†’ Channel      β”‚   β”‚                         β”‚
β”‚ β€’ Extract ByteBuf       β”‚   β”‚                         β”‚   β”‚ β€’ Update file position  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜


Step 3: Complete File
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Last Chunk Received   β”‚   β”‚   Flush and Close       β”‚   β”‚   Async Validation      β”‚
β”‚                         β”‚   β”‚                         β”‚   β”‚                         β”‚
β”‚ β€’ Detect end of file    β”‚ β†’ β”‚ β€’ Force data to disk    β”‚ β†’ β”‚ β€’ Submit MD5 check      β”‚
β”‚ β€’ Validate file size    β”‚   β”‚ β€’ Close FileChannel     β”‚   β”‚ β€’ Reset handler state   β”‚
β”‚                         β”‚   β”‚                         β”‚   β”‚ β€’ Ready for next file   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow Transformation at Convert to Channel

Network β†’ ByteBuf β†’ ByteBufInputStream β†’ ReadableByteChannel β†’ FileChannel
   β”‚         β”‚              β”‚                    β”‚               β”‚
   β”‚         β”‚              β”‚                    β”‚               └─ Disk File
   β”‚         β”‚              β”‚                    └─ NIO Channel (efficient)
   β”‚         β”‚              └─ Java Stream (bridge)
   β”‚         └─ Netty Buffer
   └─ Raw network packets

πŸ“Š Step 4: Metadata Transfer & Completion Protocol

Critical Ordering for Data Consistency:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. Prepare  β”‚   β”‚ 2. Transfer β”‚   β”‚ 3. Send     β”‚   β”‚ 4. Send     β”‚
β”‚ RocksDB     β”‚ β†’ β”‚ All Files   β”‚ β†’ β”‚ Metadata    β”‚β†’  β”‚ COMPLETE    β”‚
β”‚ Snapshot    β”‚   β”‚ (with MD5)  β”‚   β”‚ (Offset +   β”‚   β”‚ Signal      β”‚
β”‚             β”‚   β”‚             β”‚   β”‚ SVS)        β”‚   β”‚             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why This Ordering is Critical:

❌ Wrong Order (Metadata β†’ Files β†’ Complete):

  • Client updates offset records immediately
  • If file transfer fails, offset state is corrupted
  • Need offset/SVS rollback mechanisms
  • Increased error handling complexity and risk

βœ… Correct Order (Files β†’ Metadata β†’ Complete):

  • Files transferred and validated first
  • Metadata only sent after successful file transfer
  • Atomic state update on COMPLETE signal
  • Less risk consistency guarantee

Metadata Consistency Protocol:

  • Metadata: OffsetRecord + StoreVersionState captured before snapshot creation time
  • JSON Serialization: Structured metadata transfer with size validation

🌐 Traffic Control

πŸš₯ Global Traffic Shaping

Shared Rate Limiting: Single GlobalChannelTrafficShapingHandler instance controls bandwidth across ALL blob transfer channels globally

Traffic Management Strategy

Global Control Architecture:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Single    β”‚   β”‚  Monitor    β”‚   β”‚  Enforce    β”‚
β”‚ Controller  β”‚ β†’ β”‚ All Channelsβ”‚ β†’ β”‚ Bandwidth   β”‚
β”‚  Instance   β”‚   β”‚ Globally    β”‚   β”‚  Limits     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • Write Limit: Global bytes/sec across all channels
  • Read Limit: Global bytes/sec across all channels
  • Check Interval: 1 second monitoring cycle

Adaptive Throttling System

Dynamic Rate Adjustment:

  • Increment/Decrement: 20%
  • Range: 20% - 200% of base rate
  • Separate read/write throttlers
  • Idle threshold handling

πŸŽ›οΈ Configuration & Operations

Configurations

πŸ“₯ Receiver/Client Feature Enablement Control

Venice Server:

  • Store Level: blobTransferInServerEnabled
  • Application Level: blob.transfer.receiver.server.policy

DaVinci Application:

  • Store Level: blobTransferEnabled

πŸ“€ Sender/Server Feature Enablement Control

All Applications:

  • Application Level: blob.transfer.manager.enabled

Performance Tuning Parameters

πŸͺœ Thresholds

  • Offset Lag: Skip blob transfer if not lagged enough
  • Snapshot TTL: Maintain snapshot freshness
  • Snapshot Cleanup Interval: Maintain disk storage

🏎️ Bandwidth Limits

  • Client Read: Client side read bytes per sec
  • Service Write: Server write bytes per sec
  • Adaptive Range: 20% - 200% of base rate
  • Max Concurrent Snapshot User: Control server concurrent serve requests load

⏰ Timeouts

  • Transfer Max: Server side max timeout for transferring files
  • Receive Max: Client side max timeout for receiving files

πŸ“‹ Summary

Venice Blob Transfer: Key Features & Benefits

πŸš€ Performance Excellence

  • Intelligent peer discovery and selection
  • Fast failover with comprehensive error handling
  • High-performance Netty streaming architecture
  • Efficient file operations and adaptive chunking

πŸ”’ Rock-Solid Reliability

  • Consistent RocksDB snapshots with point-in-time guarantees
  • Comprehensive error handling and automatic cleanup
  • Data integrity validation with MD5 and size checks
  • Automatic Kafka fallback for 100% data coverage

πŸ›‘οΈ Security & Control

  • End-to-end SSL/TLS encryption
  • Certificate-based ACL validation
  • Global traffic rate limiting and adaptive throttling
  • Comprehensive timeout and connection management

πŸ”§ Operational Excellence

  • Flexible multi-level configuration
  • Automated snapshot lifecycle management
  • Graceful degradation and low risk deployment

Result: Faster node recovery, reduced infrastructure load, improved cluster scalability with enterprise-grade reliability