As a high performance derived data storage platform, Venice is designed to ingest large amount of writes from a variety of sources while providing low latency read access to the data. These characteristics are achieved by a combination of design choices and optimizations in the write path. This document describes some internal details of the Venice write path.
Note that this page is still under construction and only provides glimpses of the Venice writing path. More details will be added in the future.
|A store is a table or the dataset in Venice. Each store has multiple partitions.
|Version Topic. The version topic is a kafka topic where holds all the data for a given store. One VT per version per Store.
|Real time Topic. A real time topic is also a kafka topic but it only contains data from nearline producers. One RT per Store.
|Apache ZooKeeper. It’s used in multiple Venice components to persist Venice metadata.
|Apache Helix. It’s used in Venice to manage the cluster and assign partitions to different servers.
|Helix External View. Venice uses it to represent the state of a server in a leader/follower mode.
|Helix Customized View. Venice uses it to represent the ingestion state of a server for a given store.
Helix assign leaders and followers status on each partition to different servers. When assignments are done, each server
will create multiple
StoreIngestionTasks to handle ingestion for the partitions it’s assigned to. Note that there’s one SIT per store per
version. Multiple partitions of the same store and version are handled by the same SIT instance. The diagram belows
describes the data flow of how a store ingestion task’s created and ingest data on the server side.
When ingestion is in different stages, the server reports the ingestion status. The ingestion signal is propagated to various downstream for different business logics. The diagram below describes the signal flow of the ingestion report.