This guide covers advanced performance tuning techniques for RocksDB, helping you achieve optimal performance for your specific workload.
Write Amplification Amount of data written to storage vs. data written by application. Lower is better.
Read Amplification Number of disk reads to retrieve data. Lower is better.
Space Amplification Ratio of database size to actual data size. Lower is better.
Latency Time to complete operations. Lower is better.
These metrics are often in tension. Optimizing one may degrade others. Tune based on your workload priorities.
Increase Write Buffer Size
Larger write buffers reduce compaction frequency:
Options options;
// Increase memtable size
options . write_buffer_size = 256 << 20 ; // 256MB (default: 64MB)
// Increase number of memtables
options . max_write_buffer_number = 6 ; // default: 2
// Control total memory across column families
options . db_write_buffer_size = 2048 << 20 ; // 2GB total
Optimize Level Compaction
// Start compaction earlier
options . level0_file_num_compaction_trigger = 2 ; // default: 4
// Increase base level size
options . max_bytes_for_level_base = 512 << 20 ; // 512MB (default: 256MB)
// Adjust level multiplier
options . max_bytes_for_level_multiplier = 8 ; // default: 10
Parallelism
// Increase background jobs
options . IncreaseParallelism ( 16 ); // Use 16 cores
// Or manually configure
options . max_background_jobs = 16 ;
options . max_subcompactions = 4 ; // Parallel subcompactions
Disable Write-Ahead Log (Careful!)
WriteOptions write_options;
write_options . disableWAL = true ; // Much faster, but not durable
// Or use sync less frequently
write_options . sync = false ;
Disabling WAL or sync means recent writes can be lost in a crash. Only use for non-critical data or when you can reconstruct data from other sources.
Block Cache
Increase block cache for better read performance:
#include "rocksdb/table.h"
#include "rocksdb/cache.h"
using ROCKSDB_NAMESPACE ::BlockBasedTableOptions;
using ROCKSDB_NAMESPACE ::NewLRUCache;
BlockBasedTableOptions table_options;
// Large block cache (adjust based on available RAM)
table_options . block_cache = NewLRUCache ( 8192 << 20 ); // 8GB
// Enable cache index and filter blocks
table_options . cache_index_and_filter_blocks = true ;
table_options . pin_l0_filter_and_index_blocks_in_cache = true ;
options . table_factory . reset (
NewBlockBasedTableFactory (table_options));
Bloom Filters
Bloom filters reduce unnecessary disk reads:
#include "rocksdb/filter_policy.h"
using ROCKSDB_NAMESPACE ::NewBloomFilterPolicy;
BlockBasedTableOptions table_options;
// 10 bits per key, ribbon filter
table_options . filter_policy . reset ( NewBloomFilterPolicy ( 10 , false ));
// Use partitioned filters for large databases
table_options . partition_filters = true ;
table_options . index_type =
BlockBasedTableOptions :: IndexType ::kTwoLevelIndexSearch;
options . table_factory . reset (
NewBlockBasedTableFactory (table_options));
Optimize Block Size
BlockBasedTableOptions table_options;
// Smaller blocks for point lookups (default: 4KB)
table_options . block_size = 4 << 10 ; // 4KB
// Larger blocks for scans
table_options . block_size = 64 << 10 ; // 64KB
options . table_factory . reset (
NewBlockBasedTableFactory (table_options));
Prefix Bloom Filters
For workloads with prefix queries:
#include "rocksdb/slice_transform.h"
using ROCKSDB_NAMESPACE ::NewFixedPrefixTransform;
// Extract prefix (first 8 bytes)
options . prefix_extractor . reset ( NewFixedPrefixTransform ( 8 ));
// Enable prefix bloom
BlockBasedTableOptions table_options;
table_options . filter_policy . reset ( NewBloomFilterPolicy ( 10 ));
options . table_factory . reset (
NewBlockBasedTableFactory (table_options));
// Use prefix iteration
ReadOptions read_options;
read_options . prefix_same_as_start = true ;
Iterator * it = db -> NewIterator (read_options);
it -> Seek ( "prefix:" );
Compression
Per-Level Compression
#include "rocksdb/compression_type.h"
using ROCKSDB_NAMESPACE ::kNoCompression;
using ROCKSDB_NAMESPACE ::kSnappyCompression;
using ROCKSDB_NAMESPACE ::kZSTD;
// No compression for hot data, strong compression for cold data
options . compression_per_level = {
kNoCompression, // L0: no compression (hot)
kNoCompression, // L1
kSnappyCompression, // L2
kSnappyCompression, // L3
kZSTD, // L4 (cold)
kZSTD, // L5
kZSTD // L6
};
// Strong compression for bottommost level
options . bottommost_compression = kZSTD;
options . bottommost_compression_opts . level = 9 ; // Max compression
Compression Options
using ROCKSDB_NAMESPACE ::CompressionOptions;
CompressionOptions compression_opts;
compression_opts . level = 6 ; // ZSTD level (default: 3)
compression_opts . strategy = 0 ; // Compression strategy
compression_opts . max_dict_bytes = 0 ; // Dictionary compression
options . compression_opts = compression_opts;
Memory Management
Write Buffer Manager
Limit total memtable memory across DB:
#include "rocksdb/write_buffer_manager.h"
using ROCKSDB_NAMESPACE ::WriteBufferManager;
// Limit total write buffer to 2GB
auto write_buffer_manager = std :: make_shared < WriteBufferManager >(
2048 << 20 , // 2GB
table_options . block_cache // Share with block cache
);
options . write_buffer_manager = write_buffer_manager;
Memory Monitoring
// Get approximate memory usage
std ::string memory_usage;
db -> GetProperty ( "rocksdb.estimate-table-readers-mem" , & memory_usage);
std ::cout << "Table readers memory: " << memory_usage << std ::endl;
db -> GetProperty ( "rocksdb.cur-size-all-mem-tables" , & memory_usage);
std ::cout << "Memtables memory: " << memory_usage << std ::endl;
db -> GetProperty ( "rocksdb.block-cache-usage" , & memory_usage);
std ::cout << "Block cache usage: " << memory_usage << std ::endl;
Compaction Tuning
Universal Compaction
For write-heavy workloads with less space:
#include "rocksdb/universal_compaction.h"
using ROCKSDB_NAMESPACE ::CompactionOptionsUniversal;
options . compaction_style = kCompactionStyleUniversal;
CompactionOptionsUniversal universal_opts;
universal_opts . size_ratio = 1 ; // default: 1
universal_opts . min_merge_width = 2 ; // default: 2
universal_opts . max_merge_width = UINT_MAX; // default: UINT_MAX
universal_opts . compression_size_percent = - 1 ; // default: -1
options . compaction_options_universal = universal_opts;
Compaction Priority
using ROCKSDB_NAMESPACE ::CompactionPri;
// Prioritize minimizing write amplification
options . compaction_pri = CompactionPri ::kMinOverlappingRatio;
// Or prioritize by file size
options . compaction_pri = CompactionPri ::kByCompensatedSize;
Rate Limiting
#include "rocksdb/rate_limiter.h"
using ROCKSDB_NAMESPACE ::NewGenericRateLimiter;
// Limit compaction I/O to 100 MB/s
options . rate_limiter . reset (
NewGenericRateLimiter (
100 << 20 , // bytes_per_sec
100 * 1000 , // refill_period_us
10 // fairness
));
Workload-Specific Tuning
Write-Heavy Workload
Options GetWriteHeavyOptions () {
Options options;
// Large write buffers
options . write_buffer_size = 256 << 20 ;
options . max_write_buffer_number = 6 ;
// Delay compaction
options . level0_file_num_compaction_trigger = 8 ;
options . level0_slowdown_writes_trigger = 20 ;
options . level0_stop_writes_trigger = 24 ;
// More parallelism
options . IncreaseParallelism ( 16 );
options . max_subcompactions = 4 ;
// Fast compression
options . compression = kLZ4Compression;
// Less strict fsync
options . bytes_per_sync = 1 << 20 ; // 1MB
return options;
}
Read-Heavy Workload
Options GetReadHeavyOptions () {
Options options;
// Large block cache
BlockBasedTableOptions table_options;
table_options . block_cache = NewLRUCache ( 8192 << 20 ); // 8GB
// Cache index and filters
table_options . cache_index_and_filter_blocks = true ;
table_options . pin_l0_filter_and_index_blocks_in_cache = true ;
// Bloom filters
table_options . filter_policy . reset ( NewBloomFilterPolicy ( 10 ));
// Keep all files open
options . max_open_files = - 1 ;
options . table_factory . reset (
NewBlockBasedTableFactory (table_options));
return options;
}
Point Lookup Optimized
Options GetPointLookupOptions () {
Options options;
BlockBasedTableOptions table_options;
// Hash index for point lookups
table_options . index_type =
BlockBasedTableOptions :: IndexType ::kHashSearch;
// Larger block cache
table_options . block_cache = NewLRUCache ( 4096 << 20 ); // 4GB
// Strong bloom filter
table_options . filter_policy . reset ( NewBloomFilterPolicy ( 10 ));
// Whole key filtering
table_options . whole_key_filtering = true ;
options . table_factory . reset (
NewBlockBasedTableFactory (table_options));
// Smaller memtable
options . write_buffer_size = 32 << 20 ; // 32MB
return options;
}
Range Scan Optimized
Options GetRangeScanOptions () {
Options options;
BlockBasedTableOptions table_options;
// Larger blocks for sequential reads
table_options . block_size = 64 << 10 ; // 64KB
// Partitioned index
table_options . index_type =
BlockBasedTableOptions :: IndexType ::kTwoLevelIndexSearch;
table_options . partition_filters = true ;
options . table_factory . reset (
NewBlockBasedTableFactory (table_options));
// Prefix extractor for range queries
options . prefix_extractor . reset ( NewFixedPrefixTransform ( 8 ));
return options;
}
Benchmarking
Using db_bench
# Write performance
./db_bench --benchmarks=fillseq \
--db=/tmp/testdb \
--num=10000000 \
--value_size=400
# Read performance
./db_bench --benchmarks=readrandom \
--db=/tmp/testdb \
--num=10000000 \
--use_existing_db=1
# Mixed workload
./db_bench --benchmarks=readwhilewriting \
--db=/tmp/testdb \
--num=10000000 \
--threads=4
Key Metrics to Monitor
Write Metrics
Read Metrics
Resource Metrics
Write throughput (ops/sec)
Write latency (p50, p99)
Stall time
Compaction statistics
Read throughput (ops/sec)
Read latency (p50, p99)
Block cache hit rate
Bloom filter effectiveness
Memory usage
Disk I/O
CPU utilization
Disk space usage
Statistics and Monitoring
Enable Statistics
#include "rocksdb/statistics.h"
using ROCKSDB_NAMESPACE ::CreateDBStatistics;
options . statistics = CreateDBStatistics ();
// Later, get statistics
std ::string stats;
db -> GetProperty ( "rocksdb.stats" , & stats);
std ::cout << stats << std ::endl;
// Get specific counter
uint64_t bytes_written = options . statistics -> getTickerCount (
rocksdb :: Tickers ::BYTES_WRITTEN);
Periodic Statistics Dump
// Dump stats to LOG every 600 seconds
options . stats_dump_period_sec = 600 ;
// Keep stats in memory
options . stats_persist_period_sec = 600 ;
options . persist_stats_to_disk = true ;
Best Practices
Profile your workload
Understand your read/write patterns, data sizes, and access patterns before tuning.
Start with defaults
Use IncreaseParallelism() and OptimizeLevelStyleCompaction() as a baseline.
Tune incrementally
Change one parameter at a time and measure the impact.
Monitor continuously
Track metrics over time to detect performance degradation.
Test with production data
Synthetic benchmarks may not reflect real-world performance.
Common Pitfalls
Avoid these common mistakes:
Setting max_open_files = -1 without enough file descriptors
Over-provisioning memory leading to OOM
Disabling WAL without understanding durability implications
Using tiny write buffers causing excessive compaction
Not monitoring disk space usage
Ignoring write stalls
Next Steps
Configuration Review configuration options
Basic Operations Master fundamental operations