Class ReplicationMetadataRocksDBStoragePartition
java.lang.Object
com.linkedin.davinci.store.AbstractStoragePartition
com.linkedin.davinci.store.rocksdb.RocksDBStoragePartition
com.linkedin.davinci.store.rocksdb.ReplicationMetadataRocksDBStoragePartition
Extends
RocksDBStoragePartition to store per-key replication metadata (RMD) alongside
values using separate RocksDB column families. Used in active/active replication mode for
deterministic conflict resolution (DCR). The RMD tracks record-level and field-level timestamps
for conflict resolution.
Single RocksDB Instance
CF: "default" CF: "timestamp_metadata"
+-------+-------------+ +-------+---------------------+
| Key | Value | | Key | Replication MD |
+-------+-------------+ +-------+---------------------+
| key_1 | value_1 | | key_1 | rmd for key_1 |
| key_2 | value_2 | | key_2 | rmd for key_2 |
| key_3 | (deleted) | | key_3 | rmd for key_3 |
+-------+-------------+ +-------+---------------------+
-
Nested Class Summary
Nested classes/interfaces inherited from class com.linkedin.davinci.store.rocksdb.RocksDBStoragePartition
RocksDBStoragePartition.RocksDBOperation<T>, RocksDBStoragePartition.RocksDBVoidOperation -
Field Summary
Fields inherited from class com.linkedin.davinci.store.rocksdb.RocksDBStoragePartition
blobDbEnabled, blobTransferInProgress, columnFamilyDescriptors, columnFamilyHandleList, deferredWrite, iteratorReadOptions, partitionId, READ_OPTIONS_DEFAULT, readCloseRWLock, readOnly, readWriteLeaderForDefaultCF, readWriteLeaderForRMDCF, replicaId, ROCKSDB_ERROR_MESSAGE_FOR_RUNNING_OUT_OF_DISK_QUOTA, storeName, storeNameAndVersion, storeVersion, writeOnly, writeOptions -
Constructor Summary
ConstructorsConstructorDescriptionReplicationMetadataRocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig) -
Method Summary
Modifier and TypeMethodDescriptionvoidbeginBatchWrite(Map<String, String> checkpointedInfo, Optional<Supplier<byte[]>> expectedChecksumSupplier) booleancheckDatabaseIntegrity(Map<String, String> checkpointedInfo) checks whether the current state of the database is valid during the start of ingestion.voidclose()Close the specific partitionvoiddeleteWithReplicationMetadata(byte[] key, byte[] replicationMetadata) Deletes a key's value but updates its replication metadata.voiddrop()Drop when it is not required anymore.voidbyte[]This API retrieves replication metadata from replicationMetadataColumnFamily.longvoidputReplicationMetadata(byte[] key, byte[] metadata) Stores replication metadata for a key.voidputWithReplicationMetadata(byte[] key, byte[] value, byte[] metadata) Stores a key-value pair along with replication metadata.voidputWithReplicationMetadata(byte[] key, ByteBuffer value, byte[] metadata) This API takes in value and metadata as ByteBuffer format and put it into RocksDB.sync()Flushes the memtable to disk.booleanMethods inherited from class com.linkedin.davinci.store.rocksdb.RocksDBStoragePartition
checkAndThrowDiskLimitException, cleanupSnapshot, cleanupSnapshot, createSnapshot, createSnapshot, delete, deleteFilesInDirectory, get, get, get, get, getApproximateMemoryUsageByType, getByKeyPrefix, getColumnFamilyHandleList, getEnvOptions, getIterator, getKeyCountEstimate, getOptions, getPartitionSizeInBytes, getRocksDBStatValue, getStoreOptions, isRocksDBPartitionBlobTransferInProgress, makeSureRocksDBIsStillOpen, multiGet, multiGet, put, put, put, reopen, verifyConfig, withOpenDatabase, withOpenDatabaseVoid, withSynchronizedDatabase, withSynchronizedDatabaseVoidMethods inherited from class com.linkedin.davinci.store.AbstractStoragePartition
getPartitionId
-
Constructor Details
-
ReplicationMetadataRocksDBStoragePartition
public ReplicationMetadataRocksDBStoragePartition(StoragePartitionConfig storagePartitionConfig, RocksDBStorageEngineFactory factory, String dbDir, RocksDBMemoryStats rocksDBMemoryStats, RocksDBThrottler rocksDbThrottler, RocksDBServerConfig rocksDBServerConfig)
-
-
Method Details
-
putWithReplicationMetadata
public void putWithReplicationMetadata(byte[] key, byte[] value, byte[] metadata) Stores a key-value pair along with replication metadata. In deferred-write mode, the value is written via the parent SST writer and metadata via this partition's SST writer. Otherwise, both are written atomically via aWriteBatch.- Overrides:
putWithReplicationMetadatain classAbstractStoragePartition- Throws:
VeniceException- if the database is closed, read-only, or the write fails
-
putReplicationMetadata
public void putReplicationMetadata(byte[] key, byte[] metadata) Stores replication metadata for a key. In deferred-write mode, writes via the SST writer; otherwise writes directly to the replication metadata column family.- Overrides:
putReplicationMetadatain classAbstractStoragePartition- Throws:
VeniceException- if the database is closed, read-only, or the write fails
-
getRmdByteUsage
public long getRmdByteUsage()- Overrides:
getRmdByteUsagein classRocksDBStoragePartition
-
putWithReplicationMetadata
This API takes in value and metadata as ByteBuffer format and put it into RocksDB. Note that it is not an efficient implementation as it copies the content to perform the ByteBuffer -> byte[] conversion. TODO: Rewrite this implementation after we adopt the thread-local direct bytebuffer approach.- Overrides:
putWithReplicationMetadatain classAbstractStoragePartition
-
getReplicationMetadata
Description copied from class:AbstractStoragePartitionThis API retrieves replication metadata from replicationMetadataColumnFamily. OnlyReplicationMetadataRocksDBStoragePartitionwill execute this method, other storage partition implementation will VeniceUnsupportedOperationException.- Overrides:
getReplicationMetadatain classAbstractStoragePartition
-
deleteWithReplicationMetadata
public void deleteWithReplicationMetadata(byte[] key, byte[] replicationMetadata) Deletes a key's value but updates its replication metadata. In deferred-write mode (repush), only the metadata is written via the SST writer. Otherwise, the delete and metadata update are applied atomically via aWriteBatch.- Overrides:
deleteWithReplicationMetadatain classAbstractStoragePartition- Throws:
VeniceException- if the database is closed, read-only, or the write fails
-
checkDatabaseIntegrity
Description copied from class:AbstractStoragePartitionchecks whether the current state of the database is valid during the start of ingestion.- Overrides:
checkDatabaseIntegrityin classRocksDBStoragePartition
-
beginBatchWrite
public void beginBatchWrite(Map<String, String> checkpointedInfo, Optional<Supplier<byte[]>> expectedChecksumSupplier) - Overrides:
beginBatchWritein classRocksDBStoragePartition
-
endBatchWrite
public void endBatchWrite()- Overrides:
endBatchWritein classRocksDBStoragePartition
-
sync
Description copied from class:RocksDBStoragePartitionFlushes the memtable to disk. In deferred-write mode, syncs the SST file writer instead.- Overrides:
syncin classRocksDBStoragePartition- Returns:
- Database related info, which is required to be checkpointed.
-
close
public void close()Description copied from class:AbstractStoragePartitionClose the specific partition- Overrides:
closein classRocksDBStoragePartition
-
validateBatchIngestion
public boolean validateBatchIngestion()- Overrides:
validateBatchIngestionin classRocksDBStoragePartition
-
drop
public void drop()Description copied from class:AbstractStoragePartitionDrop when it is not required anymore.- Overrides:
dropin classRocksDBStoragePartition
-
getFullPathForTempSSTFileDir
- Overrides:
getFullPathForTempSSTFileDirin classRocksDBStoragePartition
-
getRocksDBSstFileWriter
- Overrides:
getRocksDBSstFileWriterin classRocksDBStoragePartition
-
getValueFullPathForTempSSTFileDir
-
getValueRocksDBSstFileWriter
-