Class RocksDBStoragePartition

  • Direct Known Subclasses:
    ReplicationMetadataRocksDBStoragePartition

    @NotThreadSafe
    public class RocksDBStoragePartition
    extends AbstractStoragePartition
    In RocksDBStoragePartition, it assumes the update(insert/delete) will happen sequentially. If the batch push is bytewise-sorted by key, this class is leveraging SstFileWriter to generate the SST file directly and ingest all the generated SST files into the RocksDB database at the end of the push. If the ingestion is unsorted, this class is using the regular RocksDB interface to support update operations.
    • Field Detail

      • READ_OPTIONS_DEFAULT

        protected static final org.rocksdb.ReadOptions READ_OPTIONS_DEFAULT
      • writeOptions

        protected final org.rocksdb.WriteOptions writeOptions
        Here RocksDB disables WAL, but relies on the 'flush', which will be invoked through sync() to avoid data loss during recovery.
      • storeName

        protected final java.lang.String storeName
      • blobTransferEnabled

        protected final boolean blobTransferEnabled
      • partitionId

        protected final int partitionId
      • readCloseRWLock

        protected final java.util.concurrent.locks.ReentrantReadWriteLock readCloseRWLock
        Since all the modification functions are synchronized, we don't need any other synchronization for the update path to guard RocksDB closing behavior. The following readCloseRWLock is only used to guard get(byte[]) since we don't want to synchronize get requests.
      • rocksDB

        protected org.rocksdb.RocksDB rocksDB
      • deferredWrite

        protected final boolean deferredWrite
        Whether the input is sorted or not.
        deferredWrite = sortedInput => ingested via batch push which is sorted in VPJ, can use RocksDBSstFileWriter to ingest the input data to RocksDB
        !deferredWrite = !sortedInput => can not use RocksDBSstFileWriter for ingestion
      • readOnly

        protected final boolean readOnly
        Whether the database is read only or not.
      • writeOnly

        protected final boolean writeOnly
      • columnFamilyHandleList

        protected final java.util.List<org.rocksdb.ColumnFamilyHandle> columnFamilyHandleList
        Column Family is the concept in RocksDB to create isolation between different value for the same key. All KVs are stored in `DEFAULT` column family, if no column family is specified. If we stores replication metadata in the RocksDB, we stored it in a separated column family. We will insert all the column family descriptors into columnFamilyDescriptors and pass it to RocksDB when opening the store, and it will fill the columnFamilyHandles with handles which will be used when we want to put/get/delete from different RocksDB column families.
      • columnFamilyDescriptors

        protected final java.util.List<org.rocksdb.ColumnFamilyDescriptor> columnFamilyDescriptors
    • Method Detail

      • makeSureRocksDBIsStillOpen

        protected void makeSureRocksDBIsStillOpen()
      • getEnvOptions

        protected org.rocksdb.EnvOptions getEnvOptions()
      • getBlobTransferEnabled

        protected java.lang.Boolean getBlobTransferEnabled()
      • getColumnFamilyHandleList

        protected java.util.List<org.rocksdb.ColumnFamilyHandle> getColumnFamilyHandleList()
      • checkDatabaseIntegrity

        public boolean checkDatabaseIntegrity​(java.util.Map<java.lang.String,​java.lang.String> checkpointedInfo)
        Description copied from class: AbstractStoragePartition
        checks whether the current state of the database is valid during the start of ingestion.
        Overrides:
        checkDatabaseIntegrity in class AbstractStoragePartition
      • beginBatchWrite

        public void beginBatchWrite​(java.util.Map<java.lang.String,​java.lang.String> checkpointedInfo,
                                    java.util.Optional<java.util.function.Supplier<byte[]>> expectedChecksumSupplier)
        Overrides:
        beginBatchWrite in class AbstractStoragePartition
      • get

        public byte[] get​(byte[] key)
        Description copied from class: AbstractStoragePartition
        Get a value from the partition database
        Specified by:
        get in class AbstractStoragePartition
        Parameters:
        key - key to be retrieved
        Returns:
        null if the key does not exist, byte[] value if it exists.
      • get

        public java.nio.ByteBuffer get​(byte[] key,
                                       java.nio.ByteBuffer valueToBePopulated)
        Overrides:
        get in class AbstractStoragePartition
      • get

        public <K,​V> V get​(K key)
        Description copied from class: AbstractStoragePartition
        Get a Value from the partition database
        Specified by:
        get in class AbstractStoragePartition
        Type Parameters:
        K - the type for Key
        V - the type for the return value
        Parameters:
        key - key to be retrieved
        Returns:
        null if the key does not exist, V value if it exists
      • multiGet

        public java.util.List<byte[]> multiGet​(java.util.List<byte[]> keys)
      • multiGet

        public java.util.List<java.nio.ByteBuffer> multiGet​(java.util.List<java.nio.ByteBuffer> keys,
                                                            java.util.List<java.nio.ByteBuffer> values)
      • getByKeyPrefix

        public void getByKeyPrefix​(byte[] keyPrefix,
                                   BytesStreamingCallback callback)
        Description copied from class: AbstractStoragePartition
        Populate provided callback with key-value pairs from the partition database where the keys have provided prefix. If prefix is null, callback will be populated will all key-value pairs from the partition database.
        Specified by:
        getByKeyPrefix in class AbstractStoragePartition
      • sync

        public java.util.Map<java.lang.String,​java.lang.String> sync()
        Description copied from class: AbstractStoragePartition
        Sync current database.
        Specified by:
        sync in class AbstractStoragePartition
        Returns:
        Database related info, which is required to be checkpointed.
      • deleteFilesInDirectory

        public void deleteFilesInDirectory​(java.lang.String fullPath)
      • reopen

        public void reopen()
        Reopen the underlying RocksDB database, and this operation will unload the data cached in memory.
        Overrides:
        reopen in class AbstractStoragePartition
      • getRocksDBStatValue

        public long getRocksDBStatValue​(java.lang.String statName)
      • getApproximateMemoryUsageByType

        public java.util.Map<org.rocksdb.MemoryUsageType,​java.lang.Long> getApproximateMemoryUsageByType​(java.util.Set<org.rocksdb.Cache> caches)
      • getOptions

        protected org.rocksdb.Options getOptions()
      • getFullPathForTempSSTFileDir

        public java.lang.String getFullPathForTempSSTFileDir()