Skip to main content
This guide covers advanced performance tuning techniques for RocksDB, helping you achieve optimal performance for your specific workload.

Performance Fundamentals

Write Amplification

Amount of data written to storage vs. data written by application. Lower is better.

Read Amplification

Number of disk reads to retrieve data. Lower is better.

Space Amplification

Ratio of database size to actual data size. Lower is better.

Latency

Time to complete operations. Lower is better.
These metrics are often in tension. Optimizing one may degrade others. Tune based on your workload priorities.

Write Performance

Increase Write Buffer Size

Larger write buffers reduce compaction frequency:
Options options;

// Increase memtable size
options.write_buffer_size = 256 << 20;  // 256MB (default: 64MB)

// Increase number of memtables
options.max_write_buffer_number = 6;  // default: 2

// Control total memory across column families
options.db_write_buffer_size = 2048 << 20;  // 2GB total

Optimize Level Compaction

// Start compaction earlier
options.level0_file_num_compaction_trigger = 2;  // default: 4

// Increase base level size
options.max_bytes_for_level_base = 512 << 20;  // 512MB (default: 256MB)

// Adjust level multiplier
options.max_bytes_for_level_multiplier = 8;  // default: 10

Parallelism

// Increase background jobs
options.IncreaseParallelism(16);  // Use 16 cores

// Or manually configure
options.max_background_jobs = 16;
options.max_subcompactions = 4;  // Parallel subcompactions

Disable Write-Ahead Log (Careful!)

WriteOptions write_options;
write_options.disableWAL = true;  // Much faster, but not durable

// Or use sync less frequently
write_options.sync = false;
Disabling WAL or sync means recent writes can be lost in a crash. Only use for non-critical data or when you can reconstruct data from other sources.

Read Performance

Block Cache

Increase block cache for better read performance:
#include "rocksdb/table.h"
#include "rocksdb/cache.h"

using ROCKSDB_NAMESPACE::BlockBasedTableOptions;
using ROCKSDB_NAMESPACE::NewLRUCache;

BlockBasedTableOptions table_options;

// Large block cache (adjust based on available RAM)
table_options.block_cache = NewLRUCache(8192 << 20);  // 8GB

// Enable cache index and filter blocks
table_options.cache_index_and_filter_blocks = true;
table_options.pin_l0_filter_and_index_blocks_in_cache = true;

options.table_factory.reset(
    NewBlockBasedTableFactory(table_options));

Bloom Filters

Bloom filters reduce unnecessary disk reads:
#include "rocksdb/filter_policy.h"

using ROCKSDB_NAMESPACE::NewBloomFilterPolicy;

BlockBasedTableOptions table_options;

// 10 bits per key, ribbon filter
table_options.filter_policy.reset(NewBloomFilterPolicy(10, false));

// Use partitioned filters for large databases
table_options.partition_filters = true;
table_options.index_type = 
    BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch;

options.table_factory.reset(
    NewBlockBasedTableFactory(table_options));

Optimize Block Size

BlockBasedTableOptions table_options;

// Smaller blocks for point lookups (default: 4KB)
table_options.block_size = 4 << 10;  // 4KB

// Larger blocks for scans
table_options.block_size = 64 << 10;  // 64KB

options.table_factory.reset(
    NewBlockBasedTableFactory(table_options));

Prefix Bloom Filters

For workloads with prefix queries:
#include "rocksdb/slice_transform.h"

using ROCKSDB_NAMESPACE::NewFixedPrefixTransform;

// Extract prefix (first 8 bytes)
options.prefix_extractor.reset(NewFixedPrefixTransform(8));

// Enable prefix bloom
BlockBasedTableOptions table_options;
table_options.filter_policy.reset(NewBloomFilterPolicy(10));
options.table_factory.reset(
    NewBlockBasedTableFactory(table_options));

// Use prefix iteration
ReadOptions read_options;
read_options.prefix_same_as_start = true;
Iterator* it = db->NewIterator(read_options);
it->Seek("prefix:");

Compression

Per-Level Compression

#include "rocksdb/compression_type.h"

using ROCKSDB_NAMESPACE::kNoCompression;
using ROCKSDB_NAMESPACE::kSnappyCompression;
using ROCKSDB_NAMESPACE::kZSTD;

// No compression for hot data, strong compression for cold data
options.compression_per_level = {
    kNoCompression,      // L0: no compression (hot)
    kNoCompression,      // L1
    kSnappyCompression,  // L2
    kSnappyCompression,  // L3
    kZSTD,               // L4 (cold)
    kZSTD,               // L5
    kZSTD                // L6
};

// Strong compression for bottommost level
options.bottommost_compression = kZSTD;
options.bottommost_compression_opts.level = 9;  // Max compression

Compression Options

using ROCKSDB_NAMESPACE::CompressionOptions;

CompressionOptions compression_opts;
compression_opts.level = 6;              // ZSTD level (default: 3)
compression_opts.strategy = 0;           // Compression strategy
compression_opts.max_dict_bytes = 0;     // Dictionary compression

options.compression_opts = compression_opts;

Memory Management

Write Buffer Manager

Limit total memtable memory across DB:
#include "rocksdb/write_buffer_manager.h"

using ROCKSDB_NAMESPACE::WriteBufferManager;

// Limit total write buffer to 2GB
auto write_buffer_manager = std::make_shared<WriteBufferManager>(
    2048 << 20,  // 2GB
    table_options.block_cache  // Share with block cache
);

options.write_buffer_manager = write_buffer_manager;

Memory Monitoring

// Get approximate memory usage
std::string memory_usage;
db->GetProperty("rocksdb.estimate-table-readers-mem", &memory_usage);
std::cout << "Table readers memory: " << memory_usage << std::endl;

db->GetProperty("rocksdb.cur-size-all-mem-tables", &memory_usage);
std::cout << "Memtables memory: " << memory_usage << std::endl;

db->GetProperty("rocksdb.block-cache-usage", &memory_usage);
std::cout << "Block cache usage: " << memory_usage << std::endl;

Compaction Tuning

Universal Compaction

For write-heavy workloads with less space:
#include "rocksdb/universal_compaction.h"

using ROCKSDB_NAMESPACE::CompactionOptionsUniversal;

options.compaction_style = kCompactionStyleUniversal;

CompactionOptionsUniversal universal_opts;
universal_opts.size_ratio = 1;                    // default: 1
universal_opts.min_merge_width = 2;               // default: 2
universal_opts.max_merge_width = UINT_MAX;        // default: UINT_MAX
universal_opts.compression_size_percent = -1;     // default: -1

options.compaction_options_universal = universal_opts;

Compaction Priority

using ROCKSDB_NAMESPACE::CompactionPri;

// Prioritize minimizing write amplification
options.compaction_pri = CompactionPri::kMinOverlappingRatio;

// Or prioritize by file size
options.compaction_pri = CompactionPri::kByCompensatedSize;

Rate Limiting

#include "rocksdb/rate_limiter.h"

using ROCKSDB_NAMESPACE::NewGenericRateLimiter;

// Limit compaction I/O to 100 MB/s
options.rate_limiter.reset(
    NewGenericRateLimiter(
        100 << 20,  // bytes_per_sec
        100 * 1000, // refill_period_us
        10          // fairness
    ));

Workload-Specific Tuning

Write-Heavy Workload

Options GetWriteHeavyOptions() {
    Options options;
    
    // Large write buffers
    options.write_buffer_size = 256 << 20;
    options.max_write_buffer_number = 6;
    
    // Delay compaction
    options.level0_file_num_compaction_trigger = 8;
    options.level0_slowdown_writes_trigger = 20;
    options.level0_stop_writes_trigger = 24;
    
    // More parallelism
    options.IncreaseParallelism(16);
    options.max_subcompactions = 4;
    
    // Fast compression
    options.compression = kLZ4Compression;
    
    // Less strict fsync
    options.bytes_per_sync = 1 << 20;  // 1MB
    
    return options;
}

Read-Heavy Workload

Options GetReadHeavyOptions() {
    Options options;
    
    // Large block cache
    BlockBasedTableOptions table_options;
    table_options.block_cache = NewLRUCache(8192 << 20);  // 8GB
    
    // Cache index and filters
    table_options.cache_index_and_filter_blocks = true;
    table_options.pin_l0_filter_and_index_blocks_in_cache = true;
    
    // Bloom filters
    table_options.filter_policy.reset(NewBloomFilterPolicy(10));
    
    // Keep all files open
    options.max_open_files = -1;
    
    options.table_factory.reset(
        NewBlockBasedTableFactory(table_options));
    
    return options;
}

Point Lookup Optimized

Options GetPointLookupOptions() {
    Options options;
    
    BlockBasedTableOptions table_options;
    
    // Hash index for point lookups
    table_options.index_type = 
        BlockBasedTableOptions::IndexType::kHashSearch;
    
    // Larger block cache
    table_options.block_cache = NewLRUCache(4096 << 20);  // 4GB
    
    // Strong bloom filter
    table_options.filter_policy.reset(NewBloomFilterPolicy(10));
    
    // Whole key filtering
    table_options.whole_key_filtering = true;
    
    options.table_factory.reset(
        NewBlockBasedTableFactory(table_options));
    
    // Smaller memtable
    options.write_buffer_size = 32 << 20;  // 32MB
    
    return options;
}

Range Scan Optimized

Options GetRangeScanOptions() {
    Options options;
    
    BlockBasedTableOptions table_options;
    
    // Larger blocks for sequential reads
    table_options.block_size = 64 << 10;  // 64KB
    
    // Partitioned index
    table_options.index_type = 
        BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch;
    table_options.partition_filters = true;
    
    options.table_factory.reset(
        NewBlockBasedTableFactory(table_options));
    
    // Prefix extractor for range queries
    options.prefix_extractor.reset(NewFixedPrefixTransform(8));
    
    return options;
}

Benchmarking

Using db_bench

# Write performance
./db_bench --benchmarks=fillseq \
  --db=/tmp/testdb \
  --num=10000000 \
  --value_size=400

# Read performance
./db_bench --benchmarks=readrandom \
  --db=/tmp/testdb \
  --num=10000000 \
  --use_existing_db=1

# Mixed workload
./db_bench --benchmarks=readwhilewriting \
  --db=/tmp/testdb \
  --num=10000000 \
  --threads=4

Key Metrics to Monitor

  • Write throughput (ops/sec)
  • Write latency (p50, p99)
  • Stall time
  • Compaction statistics

Statistics and Monitoring

Enable Statistics

#include "rocksdb/statistics.h"

using ROCKSDB_NAMESPACE::CreateDBStatistics;

options.statistics = CreateDBStatistics();

// Later, get statistics
std::string stats;
db->GetProperty("rocksdb.stats", &stats);
std::cout << stats << std::endl;

// Get specific counter
uint64_t bytes_written = options.statistics->getTickerCount(
    rocksdb::Tickers::BYTES_WRITTEN);

Periodic Statistics Dump

// Dump stats to LOG every 600 seconds
options.stats_dump_period_sec = 600;

// Keep stats in memory
options.stats_persist_period_sec = 600;
options.persist_stats_to_disk = true;

Best Practices

1

Profile your workload

Understand your read/write patterns, data sizes, and access patterns before tuning.
2

Start with defaults

Use IncreaseParallelism() and OptimizeLevelStyleCompaction() as a baseline.
3

Tune incrementally

Change one parameter at a time and measure the impact.
4

Monitor continuously

Track metrics over time to detect performance degradation.
5

Test with production data

Synthetic benchmarks may not reflect real-world performance.

Common Pitfalls

Avoid these common mistakes:
  • Setting max_open_files = -1 without enough file descriptors
  • Over-provisioning memory leading to OOM
  • Disabling WAL without understanding durability implications
  • Using tiny write buffers causing excessive compaction
  • Not monitoring disk space usage
  • Ignoring write stalls

Next Steps

Configuration

Review configuration options

Basic Operations

Master fundamental operations

Build docs developers (and LLMs) love