Introduction

As a high performance derived data storage platform, Venice is designed to ingest large amount of writes from a variety of sources while providing low latency read access to the data. These characteristics are achieved by a combination of design choices and optimizations in the write path. This document describes some internal details of the Venice write path.

Note that this page is still under construction and only provides glimpses of the Venice writing path. More details will be added in the future.

Terminology

Name Description
Store A store is a table or the dataset in Venice. Each store has multiple partitions.
VT Version Topic. The version topic is a kafka topic where holds all the data for a given store. One VT per version per Store.
RT Real time Topic. A real time topic is also a kafka topic but it only contains data from nearline producers. One RT per Store.
ZK Apache ZooKeeper. It’s used in multiple Venice components to persist Venice metadata.
Helix Apache Helix. It’s used in Venice to manage the cluster and assign partitions to different servers.
EV Helix External View. Venice uses it to represent the state of a server in a leader/follower mode.
CV Helix Customized View. Venice uses it to represent the ingestion state of a server for a given store.

Venice Store Ingestion on Server Side

Helix assign leaders and followers status on each partition to different servers. When assignments are done, each server will create multiple StoreIngestionTasks to handle ingestion for the partitions it’s assigned to. Note that there’s one SIT per store per version. Multiple partitions of the same store and version are handled by the same SIT instance. The diagram below describes the data flow of how a store ingestion task’s created and ingest data on the server side.

Server Ingestion Diagram

Venice Store Ingestion Report

When ingestion is in different stages, the server reports the ingestion status. The ingestion signal is propagated to various downstream for different business logics. The diagram below describes the signal flow of the ingestion report.

Server Ingestion Report Diagram

Venice Large Record Value Chunking

Due to the ~1MB Kafka size limit, records larger than ~1MB will need to be chunked into multiple Kafka messages by VeniceWriter#putLargeValue().

In the batch push job path, the chunking in VeniceWriter (a class for writing data into Kafka) starts within VenicePushJob (more specifically, it’s done in the PartitionWriter stage of the DataWriterJob). The chunks and manifest move through Kafka and the Venice Server and until they are stored unchanged in RocksDB. The original large record must be reconstructed at read-time by using the chunks and manifest in order to be accessed.

In the nearline job / partial update path, the Consumer must first reconstruct the large record. In RocksDB, the original key of the record points to the ChunkedValueManifest, which includes a list of keys that point to the chunks. Using these components, the Consumer can reconstruct the large record, apply the partial update from RT, and then divide it into a new set of chunks and a new manifest. The new chunks are written to RocksDB, the stored manifest is updated with the new keys, and the old chunks are deleted.

The diagram below illustrates the aforementioned chunking system. (Link to diagram source)

Large Record Value Chunking Diagram