Venice Single-Datacenter Docker Quickstart#
Follow this guide to set up a simple venice cluster using docker images provided by Venice team.
Step 1: Install and set up Docker Engine and docker-compose#
Follow https://docs.docker.com/engine/install/ to install docker and start docker engine
Step 2: Download Venice quickstart Docker compose file#
wget https://raw.githubusercontent.com/linkedin/venice/main/docker/docker-compose-single-dc-setup.yaml
Step 3: Run docker compose#
This will download and start containers for kafka, zookeeper, venice-controller, venice-router, venice-server, and venice-client. Once containers are up and running, it will create a test cluster, namely, venice-cluster.
Note: Make sure the docker-compose-single-dc-setup.yaml downloaded in step 2 is in the same directory from which you will run the following command.
Step 4: Access venice-client container's bash shell#
From this container, we will create a store in venice-cluster, which was created in step 3, push data to it and run queries against it.
Step 5: Create a venice store#
The below script uses venice-admin-tool to create a new store: venice-store. We will use the following key and value schema for store creation.
key schema:
value schema:
Let's create a venice store:
./create-store.sh http://venice-controller:5555 venice-cluster0 test-store sample-data/schema/keySchema.avsc sample-data/schema/valueSchema.avsc
Step 6: Push data to the store#
Venice supports multiple ways to write data to the store. For more details, please refer to Write APIs. In this example, we will use batch push mode and push 100 records.
Print dataset#
Run a push job#
Let's push the data:
Step 7: Read data from the store#
Venice provides two client types for reading data. Both are included in the venice-client container image.
Option A: Thin client (routes through the router)#
The thin client sends all read requests through the Venice Router.
For example:
$ ./fetch.sh http://venice-router:7777 test-store 1
key=1
value=val1
$ ./fetch.sh http://venice-router:7777 test-store 100
key=100
value=val100
# Now if we do get on non-existing key, venice will return `null`
$ ./fetch.sh http://venice-router:7777 test-store 101
key=101
value=null
Option B: Fast client (D2 service discovery, reads directly from servers)#
The fast client uses D2 service discovery for initial cluster discovery via the router, then fetches metadata and routes data reads directly to the storage servers, bypassing the router. This requires enabling storage node read quota on the store:
java -jar /opt/venice/bin/venice-admin-tool-all.jar --update-store \
--url http://venice-controller:5555 --cluster venice-cluster0 \
--store test-store --storage-node-read-quota-enabled true
Then query using the fast client:
For example:
$ ./fast-client-fetch.sh test-store 1
key=1
value=val1
$ ./fast-client-fetch.sh test-store 100
key=100
value=val100
# Non-existing key returns null
$ ./fast-client-fetch.sh test-store 101
key=101
value=null
Step 8: Update and add some new records using Incremental Push#
Venice supports incremental push which allows us to update values of existing rows or to add new rows in an existing store. In this example, we will
- update values for keys from
91-100. For example, the new value of100will beval100_v1 - add new rows (key:
101-110)
Print records to be updated and added to the existing dataset in the store#
Run incremental push job#
Step 9: Read data from the store after Incremental Push#
Incremental Push updated the values of keys 91-100 and added new rows 101-110. Let's read the data once again.
# Value of 1 changed remains unchanged
$ ./fetch.sh http://venice-router:7777 test-store 1
key=1
value=val1
# Value of 100 changed from test_name_100 to test_name_100_v1
$ ./fetch.sh http://venice-router:7777 test-store 100
key=100
value=val100_v1
# Incremental Push added value for previously non-existing key 101
$ ./fetch.sh http://venice-router:7777 test-store 101
key=101
value=val101
Step 10: Exit venice-client#
Step 11: Stop docker#
Tear down the venice cluster
Next steps#
Venice is a feature rich derived data store. It offers features such as write-compute, read-compute, streaming ingestion, multi data center active-active replication, deterministic conflict resolution, etc. To know more about such features please refer to the User Guide and reach out to the Venice team.