Class RocksDBStoragePartition
java.lang.Object
com.linkedin.davinci.store.AbstractStoragePartition
com.linkedin.davinci.store.rocksdb.RocksDBStoragePartition
- Direct Known Subclasses:
ReplicationMetadataRocksDBStoragePartition
In
RocksDBStoragePartition, it assumes the update(insert/delete) will happen sequentially.
If the batch push is bytewise-sorted by key, this class is leveraging SstFileWriter to
generate the SST file directly and ingest all the generated SST files into the RocksDB database
at the end of the push.
If the ingestion is unsorted, this class is using the regular RocksDB interface to support update
operations.-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final booleanprotected final List<org.rocksdb.ColumnFamilyDescriptor>protected final List<org.rocksdb.ColumnFamilyHandle>Column Family is the concept in RocksDB to create isolation between different value for the same key.protected final booleanWhether the input is sorted or not.protected final org.rocksdb.ReadOptionsprotected final intprotected static final org.rocksdb.ReadOptionsprotected final ReentrantReadWriteLockSince all the modification functions are synchronized, we don't need any other synchronization for the update path to guard RocksDB closing behavior.protected final booleanWhether the database is read only or not.protected final booleanprotected final booleanprotected final Stringprotected org.rocksdb.RocksDBprotected static final Stringprotected final Stringprotected final Stringprotected final intprotected final booleanprotected final org.rocksdb.WriteOptionsHere RocksDB disables WAL, but relies on the 'flush', which will be invoked throughsync()to avoid data loss during recovery. -
Constructor Summary
ConstructorsModifierConstructorDescriptionRocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig) protectedRocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig, List<byte[]> columnFamilyNameList) -
Method Summary
Modifier and TypeMethodDescriptionvoidbeginBatchWrite(Map<String, String> checkpointedInfo, Optional<Supplier<byte[]>> expectedChecksumSupplier) voidcheckAndThrowDiskLimitException(org.rocksdb.RocksDBException e) booleancheckDatabaseIntegrity(Map<String, String> checkpointedInfo) checks whether the current state of the database is valid during the start of ingestion.voidCleans up the snapshotstatic voidcleanupSnapshot(String fullPathForPartitionDBSnapshot) A util method to clean up snapshot;voidclose()Close the specific partitionvoidCreates a snapshot of the current state of the storage if the blob transfer feature is enabled via the store configurationstatic voidcreateSnapshot(org.rocksdb.RocksDB rocksDB, String fullPathForPartitionDBSnapshot) util method to create a snapshot It will check the snapshot directory and delete it if it exists, then generate a new snapshotvoiddelete(byte[] key) Delete a key from the partition databasevoiddeleteFilesInDirectory(String fullPath) voiddrop()Drop when it is not required anymore.voidbyte[]get(byte[] key) Get a value from the partition databaseget(byte[] key, ByteBuffer valueToBePopulated) byte[]get(ByteBuffer keyBuffer) <K,V> V get(K key) Get a Value from the partition databasegetApproximateMemoryUsageByType(Set<org.rocksdb.Cache> caches) voidgetByKeyPrefix(byte[] keyPrefix, BytesStreamingCallback callback) Populate provided callback with key-value pairs from the partition database where the keys have provided prefix.protected List<org.rocksdb.ColumnFamilyHandle>protected org.rocksdb.EnvOptionslongprotected org.rocksdb.OptionslongGet the partition database size in byteslonglonggetRocksDBStatValue(String statName) protected org.rocksdb.OptionsgetStoreOptions(StoragePartitionConfig storagePartitionConfig, boolean isRMD) booleanprotected voidList<byte[]>multiGet(List<ByteBuffer> keys, List<ByteBuffer> values) voidput(byte[] key, byte[] value) Puts a value into the partition databasevoidput(byte[] key, ByteBuffer valueBuffer) <K,V> void put(K key, V value) voidreopen()Reopen the underlying RocksDB database, and this operation will unload the data cached in memory.sync()Sync current database.booleanbooleanverifyConfig(StoragePartitionConfig partitionConfig) Methods inherited from class com.linkedin.davinci.store.AbstractStoragePartition
deleteWithReplicationMetadata, getPartitionId, getReplicationMetadata, putReplicationMetadata, putWithReplicationMetadata, putWithReplicationMetadata
-
Field Details
-
ROCKSDB_ERROR_MESSAGE_FOR_RUNNING_OUT_OF_DISK_QUOTA
- See Also:
-
READ_OPTIONS_DEFAULT
protected static final org.rocksdb.ReadOptions READ_OPTIONS_DEFAULT -
writeOptions
protected final org.rocksdb.WriteOptions writeOptionsHere RocksDB disables WAL, but relies on the 'flush', which will be invoked throughsync()to avoid data loss during recovery. -
iteratorReadOptions
protected final org.rocksdb.ReadOptions iteratorReadOptions -
replicaId
-
storeName
-
storeNameAndVersion
-
storeVersion
protected final int storeVersion -
partitionId
protected final int partitionId -
readCloseRWLock
Since all the modification functions are synchronized, we don't need any other synchronization for the update path to guard RocksDB closing behavior. The followingreadCloseRWLockis only used to guardget(byte[])since we don't want to synchronize get requests. -
rocksDB
protected org.rocksdb.RocksDB rocksDB -
deferredWrite
protected final boolean deferredWriteWhether the input is sorted or not.
deferredWrite = sortedInput => ingested via batch push which is sorted in VPJ, can useRocksDBSstFileWriterto ingest the input data to RocksDB
!deferredWrite = !sortedInput => can not use RocksDBSstFileWriter for ingestion -
readOnly
protected final boolean readOnlyWhether the database is read only or not. -
writeOnly
protected final boolean writeOnly -
blobTransferInProgress
protected final boolean blobTransferInProgress -
readWriteLeaderForDefaultCF
protected final boolean readWriteLeaderForDefaultCF -
readWriteLeaderForRMDCF
protected final boolean readWriteLeaderForRMDCF -
columnFamilyHandleList
Column Family is the concept in RocksDB to create isolation between different value for the same key. All KVs are stored in `DEFAULT` column family, if no column family is specified. If we stores replication metadata in the RocksDB, we stored it in a separated column family. We will insert all the column family descriptors into columnFamilyDescriptors and pass it to RocksDB when opening the store, and it will fill the columnFamilyHandles with handles which will be used when we want to put/get/delete from different RocksDB column families. -
columnFamilyDescriptors
-
-
Constructor Details
-
RocksDBStoragePartition
protected RocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig, List<byte[]> columnFamilyNameList) -
RocksDBStoragePartition
public RocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig)
-
-
Method Details
-
makeSureRocksDBIsStillOpen
protected void makeSureRocksDBIsStillOpen() -
getEnvOptions
protected org.rocksdb.EnvOptions getEnvOptions() -
getStoreOptions
protected org.rocksdb.Options getStoreOptions(StoragePartitionConfig storagePartitionConfig, boolean isRMD) -
getColumnFamilyHandleList
-
getRmdByteUsage
public long getRmdByteUsage()- Overrides:
getRmdByteUsagein classAbstractStoragePartition
-
checkDatabaseIntegrity
Description copied from class:AbstractStoragePartitionchecks whether the current state of the database is valid during the start of ingestion.- Overrides:
checkDatabaseIntegrityin classAbstractStoragePartition
-
beginBatchWrite
public void beginBatchWrite(Map<String, String> checkpointedInfo, Optional<Supplier<byte[]>> expectedChecksumSupplier) - Overrides:
beginBatchWritein classAbstractStoragePartition
-
endBatchWrite
public void endBatchWrite()- Overrides:
endBatchWritein classAbstractStoragePartition
-
createSnapshot
public void createSnapshot()Description copied from class:AbstractStoragePartitionCreates a snapshot of the current state of the storage if the blob transfer feature is enabled via the store configuration- Specified by:
createSnapshotin classAbstractStoragePartition
-
isRocksDBPartitionBlobTransferInProgress
public boolean isRocksDBPartitionBlobTransferInProgress() -
cleanupSnapshot
public void cleanupSnapshot()Description copied from class:AbstractStoragePartitionCleans up the snapshot- Specified by:
cleanupSnapshotin classAbstractStoragePartition
-
checkAndThrowDiskLimitException
public void checkAndThrowDiskLimitException(org.rocksdb.RocksDBException e) -
put
public void put(byte[] key, byte[] value) Description copied from class:AbstractStoragePartitionPuts a value into the partition database- Specified by:
putin classAbstractStoragePartition
-
put
- Specified by:
putin classAbstractStoragePartition
-
put
public <K,V> void put(K key, V value) - Specified by:
putin classAbstractStoragePartition
-
get
public byte[] get(byte[] key) Description copied from class:AbstractStoragePartitionGet a value from the partition database- Specified by:
getin classAbstractStoragePartition- Parameters:
key- key to be retrieved- Returns:
- null if the key does not exist, byte[] value if it exists.
-
get
- Overrides:
getin classAbstractStoragePartition
-
get
public <K,V> V get(K key) Description copied from class:AbstractStoragePartitionGet a Value from the partition database- Specified by:
getin classAbstractStoragePartition- Type Parameters:
K- the type for KeyV- the type for the return value- Parameters:
key- key to be retrieved- Returns:
- null if the key does not exist, V value if it exists
-
get
- Specified by:
getin classAbstractStoragePartition
-
multiGet
-
multiGet
-
getByKeyPrefix
Description copied from class:AbstractStoragePartitionPopulate provided callback with key-value pairs from the partition database where the keys have provided prefix. If prefix is null, callback will be populated will all key-value pairs from the partition database.- Specified by:
getByKeyPrefixin classAbstractStoragePartition
-
validateBatchIngestion
public boolean validateBatchIngestion()- Overrides:
validateBatchIngestionin classAbstractStoragePartition
-
delete
public void delete(byte[] key) Description copied from class:AbstractStoragePartitionDelete a key from the partition database- Specified by:
deletein classAbstractStoragePartition
-
sync
Description copied from class:AbstractStoragePartitionSync current database.- Specified by:
syncin classAbstractStoragePartition- Returns:
- Database related info, which is required to be checkpointed.
-
getKeyCountEstimate
public long getKeyCountEstimate() -
deleteFilesInDirectory
-
drop
public void drop()Description copied from class:AbstractStoragePartitionDrop when it is not required anymore.- Specified by:
dropin classAbstractStoragePartition
-
close
public void close()Description copied from class:AbstractStoragePartitionClose the specific partition- Specified by:
closein classAbstractStoragePartition
-
reopen
public void reopen()Reopen the underlying RocksDB database, and this operation will unload the data cached in memory.- Overrides:
reopenin classAbstractStoragePartition
-
getRocksDBStatValue
-
getApproximateMemoryUsageByType
-
verifyConfig
- Specified by:
verifyConfigin classAbstractStoragePartition- Parameters:
partitionConfig-- Returns:
-
getPartitionSizeInBytes
public long getPartitionSizeInBytes()Description copied from class:AbstractStoragePartitionGet the partition database size in bytes- Specified by:
getPartitionSizeInBytesin classAbstractStoragePartition- Returns:
- partition database size
-
getOptions
protected org.rocksdb.Options getOptions() -
getFullPathForTempSSTFileDir
-
getRocksDBSstFileWriter
-
getIterator
- Overrides:
getIteratorin classAbstractStoragePartition
-
createSnapshot
public static void createSnapshot(org.rocksdb.RocksDB rocksDB, String fullPathForPartitionDBSnapshot) util method to create a snapshot It will check the snapshot directory and delete it if it exists, then generate a new snapshot -
cleanupSnapshot
A util method to clean up snapshot;- Parameters:
fullPathForPartitionDBSnapshot-
-