RocksDB supports multiple table formats for storing SST files, each optimized for different use cases. The most common is the block-based table format.
Block-Based Table Options
BlockBasedTableOptions
struct BlockBasedTableOptions {
bool cache_index_and_filter_blocks = false;
bool cache_index_and_filter_blocks_with_high_priority = true;
bool pin_l0_filter_and_index_blocks_in_cache = false;
bool pin_top_level_index_and_filter = true;
MetadataCacheOptions metadata_cache_options;
IndexType index_type = kBinarySearch;
DataBlockIndexType data_block_index_type = kDataBlockBinarySearch;
ChecksumType checksum = kXXH3;
bool no_block_cache = false;
std::shared_ptr<Cache> block_cache = nullptr;
uint64_t block_size = 4 * 1024;
int block_restart_interval = 16;
uint64_t metadata_block_size = 4096;
std::shared_ptr<const FilterPolicy> filter_policy = nullptr;
bool whole_key_filtering = true;
uint32_t format_version = 7;
};
Index Types
enum IndexType : char {
kBinarySearch = 0x00,
kHashSearch = 0x01,
kTwoLevelIndexSearch = 0x02,
kBinarySearchWithFirstKey = 0x03
};
Space-efficient index optimized for binary search.
Hash index for prefix lookups when prefix_extractor is provided.
Two-level index with both levels using binary search. Second level blocks use block cache.
kBinarySearchWithFirstKey
Binary search index that also contains the first key of each block, allowing iterators to defer reading blocks.
Table Factory Creation
NewBlockBasedTableFactory
TableFactory* NewBlockBasedTableFactory(
const BlockBasedTableOptions& table_options = BlockBasedTableOptions()
);
Creates a block-based table factory with the specified options.
table_options
const BlockBasedTableOptions&
Configuration for block-based tables.
Returns a pointer to the created table factory.
NewPlainTableFactory
TableFactory* NewPlainTableFactory(
const PlainTableOptions& options = PlainTableOptions()
);
Creates a plain table factory optimized for low-latency on pure-memory or very low-latency media.
NewCuckooTableFactory
TableFactory* NewCuckooTableFactory(
const CuckooTableOptions& table_options = CuckooTableOptions()
);
Creates a cuckoo hash table factory for SST files.
Cache Configuration
Block Cache
std::shared_ptr<Cache> block_cache = nullptr;
Cache for data blocks. If nullptr and no_block_cache is false, a 32MB internal cache is created.
Caching Index and Filter Blocks
bool cache_index_and_filter_blocks = false;
cache_index_and_filter_blocks
When false, index/filter blocks are pre-loaded during table initialization. When true, they use the block cache.
Checksum Types
enum ChecksumType : char {
kNoChecksum = 0x0,
kCRC32c = 0x1,
kxxHash = 0x2,
kxxHash64 = 0x3,
kXXH3 = 0x4
};
Default. Fast and high-quality checksum. Supported since RocksDB 6.27.
CRC32c checksum with hardware acceleration on x86.
Block Size and Compression
block_size
uint64_t block_size = 4 * 1024;
Approximate size of uncompressed data packed per block. Actual disk read size may be smaller if compression is enabled.
block_restart_interval
int block_restart_interval = 16;
Number of keys between restart points for delta encoding. Minimum value is 1.
Filter Configuration
filter_policy
std::shared_ptr<const FilterPolicy> filter_policy = nullptr;
filter_policy
std::shared_ptr<const FilterPolicy>
Filter policy to reduce disk reads. Use NewBloomFilterPolicy() for most applications.
whole_key_filtering
bool whole_key_filtering = true;
If true, place whole keys in the filter (not just prefixes). Must be true for efficient point lookups.
partition_filters
bool partition_filters = false;
Use partitioned filters. Requires kTwoLevelIndexSearch. Filter partition blocks use block cache even when cache_index_and_filter_blocks=false.
uint32_t format_version = 7;
Schema version for table files. Default is 7 for latest features. Version 6 adds checksum protection, version 5 adds faster Bloom filters.
Example
#include "rocksdb/table.h"
#include "rocksdb/cache.h"
#include "rocksdb/filter_policy.h"
using namespace ROCKSDB_NAMESPACE;
// Configure block-based table
BlockBasedTableOptions table_options;
table_options.block_cache = NewLRUCache(512 * 1024 * 1024); // 512MB cache
table_options.filter_policy.reset(NewBloomFilterPolicy(10));
table_options.block_size = 16 * 1024; // 16KB blocks
table_options.cache_index_and_filter_blocks = true;
table_options.pin_top_level_index_and_filter = true;
// Create table factory
Options options;
options.table_factory.reset(NewBlockBasedTableFactory(table_options));
// Open database with these options
DB* db;
Status s = DB::Open(options, "/tmp/testdb", &db);
Advanced Options
struct MetadataCacheOptions {
PinningTier top_level_index_pinning = PinningTier::kFallback;
PinningTier partition_pinning = PinningTier::kFallback;
PinningTier unpartitioned_pinning = PinningTier::kFallback;
};
Controls which block-based table tiers have their metadata pinned in cache.
Read Amplification Measurement
uint32_t read_amp_bytes_per_bit = 0;
Enable read amplification statistics. Creates a bitmap to track which parts of blocks are actually read. Must be a power of 2. Default 0 (disabled).
See Also