Class RocksDBStoragePartition
- java.lang.Object
-
- com.linkedin.davinci.store.AbstractStoragePartition
-
- com.linkedin.davinci.store.rocksdb.RocksDBStoragePartition
-
- Direct Known Subclasses:
ReplicationMetadataRocksDBStoragePartition
@NotThreadSafe public class RocksDBStoragePartition extends AbstractStoragePartition
InRocksDBStoragePartition
, it assumes the update(insert/delete) will happen sequentially. If the batch push is bytewise-sorted by key, this class is leveragingSstFileWriter
to generate the SST file directly and ingest all the generated SST files into the RocksDB database at the end of the push. If the ingestion is unsorted, this class is using the regular RocksDB interface to support update operations.
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
blobTransferEnabled
protected java.util.List<org.rocksdb.ColumnFamilyDescriptor>
columnFamilyDescriptors
protected java.util.List<org.rocksdb.ColumnFamilyHandle>
columnFamilyHandleList
Column Family is the concept in RocksDB to create isolation between different value for the same key.protected boolean
deferredWrite
Whether the input is sorted or not.protected int
partitionId
protected static org.rocksdb.ReadOptions
READ_OPTIONS_DEFAULT
protected java.util.concurrent.locks.ReentrantReadWriteLock
readCloseRWLock
Since all the modification functions are synchronized, we don't need any other synchronization for the update path to guard RocksDB closing behavior.protected boolean
readOnly
Whether the database is read only or not.protected boolean
readWriteLeaderForDefaultCF
protected boolean
readWriteLeaderForRMDCF
protected java.lang.String
replicaId
protected org.rocksdb.RocksDB
rocksDB
protected java.lang.String
storeName
protected java.lang.String
storeNameAndVersion
protected boolean
writeOnly
protected org.rocksdb.WriteOptions
writeOptions
Here RocksDB disables WAL, but relies on the 'flush', which will be invoked throughsync()
to avoid data loss during recovery.
-
Constructor Summary
Constructors Modifier Constructor Description RocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, java.lang.String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig, VeniceStoreVersionConfig storeConfig)
protected
RocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, java.lang.String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig, java.util.List<byte[]> columnFamilyNameList, VeniceStoreVersionConfig storeConfig)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
beginBatchWrite(java.util.Map<java.lang.String,java.lang.String> checkpointedInfo, java.util.Optional<java.util.function.Supplier<byte[]>> expectedChecksumSupplier)
boolean
checkDatabaseIntegrity(java.util.Map<java.lang.String,java.lang.String> checkpointedInfo)
checks whether the current state of the database is valid during the start of ingestion.void
close()
Close the specific partitionvoid
createSnapshot()
Creates a snapshot of the current state of the storage if the blob transfer feature is enabled via the store configurationvoid
delete(byte[] key)
Delete a key from the partition databasevoid
deleteFilesInDirectory(java.lang.String fullPath)
void
drop()
Drop when it is not required anymore.void
endBatchWrite()
byte[]
get(byte[] key)
Get a value from the partition databasejava.nio.ByteBuffer
get(byte[] key, java.nio.ByteBuffer valueToBePopulated)
byte[]
get(java.nio.ByteBuffer keyBuffer)
<K,V>
Vget(K key)
Get a Value from the partition databasejava.util.Map<org.rocksdb.MemoryUsageType,java.lang.Long>
getApproximateMemoryUsageByType(java.util.Set<org.rocksdb.Cache> caches)
protected java.lang.Boolean
getBlobTransferEnabled()
void
getByKeyPrefix(byte[] keyPrefix, BytesStreamingCallback callback)
Populate provided callback with key-value pairs from the partition database where the keys have provided prefix.protected java.util.List<org.rocksdb.ColumnFamilyHandle>
getColumnFamilyHandleList()
protected org.rocksdb.EnvOptions
getEnvOptions()
java.lang.String
getFullPathForTempSSTFileDir()
AbstractStorageIterator
getIterator()
protected org.rocksdb.Options
getOptions()
long
getPartitionSizeInBytes()
Get the partition database size in byteslong
getRmdByteUsage()
RocksDBSstFileWriter
getRocksDBSstFileWriter()
long
getRocksDBStatValue(java.lang.String statName)
protected org.rocksdb.Options
getStoreOptions(StoragePartitionConfig storagePartitionConfig, boolean isRMD)
protected void
makeSureRocksDBIsStillOpen()
java.util.List<byte[]>
multiGet(java.util.List<byte[]> keys)
java.util.List<java.nio.ByteBuffer>
multiGet(java.util.List<java.nio.ByteBuffer> keys, java.util.List<java.nio.ByteBuffer> values)
void
put(byte[] key, byte[] value)
Puts a value into the partition databasevoid
put(byte[] key, java.nio.ByteBuffer valueBuffer)
<K,V>
voidput(K key, V value)
void
reopen()
Reopen the underlying RocksDB database, and this operation will unload the data cached in memory.java.util.Map<java.lang.String,java.lang.String>
sync()
Sync current database.boolean
validateBatchIngestion()
boolean
verifyConfig(StoragePartitionConfig partitionConfig)
-
Methods inherited from class com.linkedin.davinci.store.AbstractStoragePartition
deleteWithReplicationMetadata, getPartitionId, getReplicationMetadata, putReplicationMetadata, putWithReplicationMetadata, putWithReplicationMetadata
-
-
-
-
Field Detail
-
READ_OPTIONS_DEFAULT
protected static final org.rocksdb.ReadOptions READ_OPTIONS_DEFAULT
-
writeOptions
protected final org.rocksdb.WriteOptions writeOptions
Here RocksDB disables WAL, but relies on the 'flush', which will be invoked throughsync()
to avoid data loss during recovery.
-
replicaId
protected final java.lang.String replicaId
-
storeName
protected final java.lang.String storeName
-
storeNameAndVersion
protected final java.lang.String storeNameAndVersion
-
blobTransferEnabled
protected final boolean blobTransferEnabled
-
partitionId
protected final int partitionId
-
readCloseRWLock
protected final java.util.concurrent.locks.ReentrantReadWriteLock readCloseRWLock
Since all the modification functions are synchronized, we don't need any other synchronization for the update path to guard RocksDB closing behavior. The followingreadCloseRWLock
is only used to guardget(byte[])
since we don't want to synchronize get requests.
-
rocksDB
protected org.rocksdb.RocksDB rocksDB
-
deferredWrite
protected final boolean deferredWrite
Whether the input is sorted or not.
deferredWrite = sortedInput => ingested via batch push which is sorted in VPJ, can useRocksDBSstFileWriter
to ingest the input data to RocksDB
!deferredWrite = !sortedInput => can not use RocksDBSstFileWriter for ingestion
-
readOnly
protected final boolean readOnly
Whether the database is read only or not.
-
writeOnly
protected final boolean writeOnly
-
readWriteLeaderForDefaultCF
protected final boolean readWriteLeaderForDefaultCF
-
readWriteLeaderForRMDCF
protected final boolean readWriteLeaderForRMDCF
-
columnFamilyHandleList
protected final java.util.List<org.rocksdb.ColumnFamilyHandle> columnFamilyHandleList
Column Family is the concept in RocksDB to create isolation between different value for the same key. All KVs are stored in `DEFAULT` column family, if no column family is specified. If we stores replication metadata in the RocksDB, we stored it in a separated column family. We will insert all the column family descriptors into columnFamilyDescriptors and pass it to RocksDB when opening the store, and it will fill the columnFamilyHandles with handles which will be used when we want to put/get/delete from different RocksDB column families.
-
columnFamilyDescriptors
protected final java.util.List<org.rocksdb.ColumnFamilyDescriptor> columnFamilyDescriptors
-
-
Constructor Detail
-
RocksDBStoragePartition
protected RocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, java.lang.String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig, java.util.List<byte[]> columnFamilyNameList, VeniceStoreVersionConfig storeConfig)
-
RocksDBStoragePartition
public RocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, java.lang.String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig, VeniceStoreVersionConfig storeConfig)
-
-
Method Detail
-
makeSureRocksDBIsStillOpen
protected void makeSureRocksDBIsStillOpen()
-
getEnvOptions
protected org.rocksdb.EnvOptions getEnvOptions()
-
getBlobTransferEnabled
protected java.lang.Boolean getBlobTransferEnabled()
-
getStoreOptions
protected org.rocksdb.Options getStoreOptions(StoragePartitionConfig storagePartitionConfig, boolean isRMD)
-
getColumnFamilyHandleList
protected java.util.List<org.rocksdb.ColumnFamilyHandle> getColumnFamilyHandleList()
-
getRmdByteUsage
public long getRmdByteUsage()
- Overrides:
getRmdByteUsage
in classAbstractStoragePartition
-
checkDatabaseIntegrity
public boolean checkDatabaseIntegrity(java.util.Map<java.lang.String,java.lang.String> checkpointedInfo)
Description copied from class:AbstractStoragePartition
checks whether the current state of the database is valid during the start of ingestion.- Overrides:
checkDatabaseIntegrity
in classAbstractStoragePartition
-
beginBatchWrite
public void beginBatchWrite(java.util.Map<java.lang.String,java.lang.String> checkpointedInfo, java.util.Optional<java.util.function.Supplier<byte[]>> expectedChecksumSupplier)
- Overrides:
beginBatchWrite
in classAbstractStoragePartition
-
endBatchWrite
public void endBatchWrite()
- Overrides:
endBatchWrite
in classAbstractStoragePartition
-
createSnapshot
public void createSnapshot()
Description copied from class:AbstractStoragePartition
Creates a snapshot of the current state of the storage if the blob transfer feature is enabled via the store configuration- Specified by:
createSnapshot
in classAbstractStoragePartition
-
put
public void put(byte[] key, byte[] value)
Description copied from class:AbstractStoragePartition
Puts a value into the partition database- Specified by:
put
in classAbstractStoragePartition
-
put
public void put(byte[] key, java.nio.ByteBuffer valueBuffer)
- Specified by:
put
in classAbstractStoragePartition
-
put
public <K,V> void put(K key, V value)
- Specified by:
put
in classAbstractStoragePartition
-
get
public byte[] get(byte[] key)
Description copied from class:AbstractStoragePartition
Get a value from the partition database- Specified by:
get
in classAbstractStoragePartition
- Parameters:
key
- key to be retrieved- Returns:
- null if the key does not exist, byte[] value if it exists.
-
get
public java.nio.ByteBuffer get(byte[] key, java.nio.ByteBuffer valueToBePopulated)
- Overrides:
get
in classAbstractStoragePartition
-
get
public <K,V> V get(K key)
Description copied from class:AbstractStoragePartition
Get a Value from the partition database- Specified by:
get
in classAbstractStoragePartition
- Type Parameters:
K
- the type for KeyV
- the type for the return value- Parameters:
key
- key to be retrieved- Returns:
- null if the key does not exist, V value if it exists
-
get
public byte[] get(java.nio.ByteBuffer keyBuffer)
- Specified by:
get
in classAbstractStoragePartition
-
multiGet
public java.util.List<byte[]> multiGet(java.util.List<byte[]> keys)
-
multiGet
public java.util.List<java.nio.ByteBuffer> multiGet(java.util.List<java.nio.ByteBuffer> keys, java.util.List<java.nio.ByteBuffer> values)
-
getByKeyPrefix
public void getByKeyPrefix(byte[] keyPrefix, BytesStreamingCallback callback)
Description copied from class:AbstractStoragePartition
Populate provided callback with key-value pairs from the partition database where the keys have provided prefix. If prefix is null, callback will be populated will all key-value pairs from the partition database.- Specified by:
getByKeyPrefix
in classAbstractStoragePartition
-
validateBatchIngestion
public boolean validateBatchIngestion()
- Overrides:
validateBatchIngestion
in classAbstractStoragePartition
-
delete
public void delete(byte[] key)
Description copied from class:AbstractStoragePartition
Delete a key from the partition database- Specified by:
delete
in classAbstractStoragePartition
-
sync
public java.util.Map<java.lang.String,java.lang.String> sync()
Description copied from class:AbstractStoragePartition
Sync current database.- Specified by:
sync
in classAbstractStoragePartition
- Returns:
- Database related info, which is required to be checkpointed.
-
deleteFilesInDirectory
public void deleteFilesInDirectory(java.lang.String fullPath)
-
drop
public void drop()
Description copied from class:AbstractStoragePartition
Drop when it is not required anymore.- Specified by:
drop
in classAbstractStoragePartition
-
close
public void close()
Description copied from class:AbstractStoragePartition
Close the specific partition- Specified by:
close
in classAbstractStoragePartition
-
reopen
public void reopen()
Reopen the underlying RocksDB database, and this operation will unload the data cached in memory.- Overrides:
reopen
in classAbstractStoragePartition
-
getRocksDBStatValue
public long getRocksDBStatValue(java.lang.String statName)
-
getApproximateMemoryUsageByType
public java.util.Map<org.rocksdb.MemoryUsageType,java.lang.Long> getApproximateMemoryUsageByType(java.util.Set<org.rocksdb.Cache> caches)
-
verifyConfig
public boolean verifyConfig(StoragePartitionConfig partitionConfig)
- Specified by:
verifyConfig
in classAbstractStoragePartition
- Parameters:
partitionConfig
-- Returns:
-
getPartitionSizeInBytes
public long getPartitionSizeInBytes()
Description copied from class:AbstractStoragePartition
Get the partition database size in bytes- Specified by:
getPartitionSizeInBytes
in classAbstractStoragePartition
- Returns:
- partition database size
-
getOptions
protected org.rocksdb.Options getOptions()
-
getFullPathForTempSSTFileDir
public java.lang.String getFullPathForTempSSTFileDir()
-
getRocksDBSstFileWriter
public RocksDBSstFileWriter getRocksDBSstFileWriter()
-
getIterator
public AbstractStorageIterator getIterator()
- Overrides:
getIterator
in classAbstractStoragePartition
-
-