Compaction

Compaction is the background process that reorganizes data in RocksDB’s LSM tree. It merges sorted files from different levels, removes deleted entries, and maintains read performance as data grows.

Why Compaction?

As writes accumulate in RocksDB, several problems emerge without compaction:

Read amplification: More files to check for each read
Space amplification: Multiple versions of the same key
Write amplification: L0 files with overlapping key ranges

Compaction solves these problems by merging files, removing obsolete data, and maintaining the LSM tree invariants.

How Compaction Works

Compaction merges SST files from one level to the next:

#include <rocksdb/db.h>

rocksdb::DB* db;
rocksdb::Options options;
options.create_if_missing = true;

// Configure compaction triggers
options.level0_file_num_compaction_trigger = 4;  // Compact when 4 L0 files
options.max_bytes_for_level_base = 256 << 20;    // 256MB for L1
options.max_bytes_for_level_multiplier = 10;     // Each level 10x larger

rocksdb::DB::Open(options, "/tmp/testdb", &db);

Compaction Process

L0 → L1
Li → Li+1
Data Cleanup

L0 to L1 compaction merges overlapping L0 files with L1:

// L0 files may overlap
// L0: [a-m], [b-p], [c-z]
// L1: [a-h], [i-p], [q-z]

// Compaction merges all overlapping files
// Result: New L1 files with all data merged and sorted

L0→L1 compaction is the most expensive because L0 files can overlap with multiple L1 files.

Lower level compactions merge one file with overlapping files in the next level:

// Select one L1 file: [i-p]
// Find overlapping L2 files: [i-m], [n-s]

// Merge selected files
// Output: New L2 files with merged, sorted data

// Each key range compacted independently

Compaction Styles

RocksDB supports multiple compaction strategies:

Level Compaction (Default)

Data is organized into levels of increasing size:

rocksdb::Options options;
options.compaction_style = rocksdb::kCompactionStyleLevel;

// Configure level sizes
options.level0_file_num_compaction_trigger = 4;
options.max_bytes_for_level_base = 256 << 20;    // 256MB
options.max_bytes_for_level_multiplier = 10;     // 10x growth

// Typical level sizes:
// L0: 4 files × 64MB = 256MB
// L1: 256MB
// L2: 2.56GB
// L3: 25.6GB
// L4: 256GB

Level Compaction Characteristics

Pros:

Lower space amplification (1.1-1.3x)
Predictable read performance
Works well for most workloads

Cons:

Higher write amplification (10-30x)
More background I/O

Best for: General-purpose workloads, read-heavy applications

Universal Compaction

Optimizes for write amplification:

#include <rocksdb/universal_compaction.h>

rocksdb::Options options;
options.compaction_style = rocksdb::kCompactionStyleUniversal;

// Configure universal compaction
rocksdb::CompactionOptionsUniversal universal_opts;
universal_opts.size_ratio = 1;                      // 1% size difference
universal_opts.min_merge_width = 2;                 // Min files to merge
universal_opts.max_merge_width = UINT_MAX;          // Max files to merge
universal_opts.max_size_amplification_percent = 200; // 200% amplification

options.compaction_options_universal = universal_opts;

Universal Compaction Characteristics

Pros:

Lower write amplification (2-5x)
Higher write throughput
Simpler model (no levels)

Cons:

Higher space amplification (1.5-2x)
Variable read performance
Large compactions can cause latency spikes

Best for: Write-heavy workloads, time-series data

FIFO Compaction

Simple time-based deletion:

#include <rocksdb/advanced_options.h>

rocksdb::Options options;
options.compaction_style = rocksdb::kCompactionStyleFIFO;

// Configure FIFO
rocksdb::CompactionOptionsFIFO fifo_opts;
fifo_opts.max_table_files_size = 10ULL << 30; // 10GB total
fifo_opts.allow_compaction = false;           // No merging

options.compaction_options_fifo = fifo_opts;

FIFO Compaction Characteristics

Pros:

Minimal write amplification (1x)
No compaction overhead
Simple space management

Cons:

No key updates or deletes (append-only)
High space amplification
Oldest data deleted when limit reached

Best for: Append-only logs, circular buffers, time-series with TTL

Compaction Priority

Control which files are compacted first:

#include <rocksdb/advanced_options.h>

rocksdb::ColumnFamilyOptions cf_options;

// Minimize overlapping ratio (recommended)
cf_options.compaction_pri = rocksdb::kMinOverlappingRatio;

// Other options:
// cf_options.compaction_pri = rocksdb::kOldestLargestSeqFirst;
// cf_options.compaction_pri = rocksdb::kOldestSmallestSeqFirst;
// cf_options.compaction_pri = rocksdb::kRoundRobin;

Manual Compaction

Trigger compaction programmatically:

// Compact entire database
rocksdb::CompactRangeOptions compact_opts;
db->CompactRange(compact_opts, nullptr, nullptr);

// Compact specific key range
rocksdb::Slice start("user:1000");
rocksdb::Slice end("user:2000");
db->CompactRange(compact_opts, &start, &end);

// Compact specific column family
db->CompactRange(compact_opts, column_family_handle, nullptr, nullptr);

Manual compaction blocks until complete and can take significant time for large databases. Use CompactRangeOptions::exclusive_manual_compaction = false to allow parallel automatic compactions.

Compaction Configuration

Background Threads

rocksdb::Options options;

// Modern unified thread pool (recommended)
options.max_background_jobs = 8; // Total threads for flush + compaction

// Legacy separate pools
// options.max_background_compactions = 6;
// options.max_background_flushes = 2;

// Increase parallelism
options.IncreaseParallelism(16); // Helper to configure threads

Write Stall Protection

Prevent writes from overwhelming compaction:

rocksdb::Options options;

// Soft limit: slow down writes
options.level0_slowdown_writes_trigger = 20;

// Hard limit: stop writes
options.level0_stop_writes_trigger = 36;

// Soft limit for pending compaction bytes
options.soft_pending_compaction_bytes_limit = 64ULL << 30; // 64GB

// Hard limit for pending compaction bytes
options.hard_pending_compaction_bytes_limit = 256ULL << 30; // 256GB

When limits are exceeded, RocksDB slows or stops writes to allow compaction to catch up. Monitor these triggers to avoid write stalls.

Compaction Size Limits

rocksdb::Options options;

// Maximum compaction size
options.max_compaction_bytes = 1ULL << 30; // 1GB

// Split large compactions into smaller jobs
// Reduces latency spikes and temp space usage

Compaction Filters

Custom logic to drop keys during compaction:

#include <rocksdb/compaction_filter.h>

class TTLCompactionFilter : public rocksdb::CompactionFilter {
 public:
  bool Filter(int level, const rocksdb::Slice& key,
              const rocksdb::Slice& value,
              std::string* new_value,
              bool* value_changed) const override {
    // Parse timestamp from value
    int64_t timestamp = /* extract from value */;
    int64_t now = /* current time */;
    
    // Drop expired keys
    if (now - timestamp > 86400) { // 24 hours
      return true; // Remove this key
    }
    
    return false; // Keep this key
  }
  
  const char* Name() const override { return "TTLCompactionFilter"; }
};

// Use the filter
rocksdb::Options options;
options.compaction_filter = new TTLCompactionFilter();

Compaction filters run on every key during compaction. Keep logic simple to avoid slowing compaction.

Monitoring Compaction

Compaction Statistics

#include <rocksdb/statistics.h>

rocksdb::Options options;
options.statistics = rocksdb::CreateDBStatistics();

rocksdb::DB::Open(options, "/tmp/testdb", &db);

// Get compaction stats
std::string stats;
db->GetProperty("rocksdb.stats", &stats);
std::cout << stats << std::endl;

// Specific compaction metrics
db->GetProperty("rocksdb.num-files-at-level0", &stats);
db->GetProperty("rocksdb.num-files-at-level1", &stats);
db->GetProperty("rocksdb.compaction-pending", &stats);

Compaction Events

#include <rocksdb/listener.h>

class MyEventListener : public rocksdb::EventListener {
 public:
  void OnCompactionBegin(rocksdb::DB* db,
                         const rocksdb::CompactionJobInfo& info) override {
    std::cout << "Compaction started: "
              << "Level " << info.base_input_level 
              << " -> " << info.output_level << std::endl;
  }
  
  void OnCompactionCompleted(rocksdb::DB* db,
                            const rocksdb::CompactionJobInfo& info) override {
    std::cout << "Compaction completed: "
              << info.input_files.size() << " files merged, "
              << info.total_input_bytes << " bytes in, "
              << info.total_output_bytes << " bytes out" << std::endl;
  }
};

rocksdb::Options options;
options.listeners.emplace_back(new MyEventListener());

Optimization Strategies

Write-Heavy
Read-Heavy
Space-Constrained

Optimize for write throughput:

rocksdb::Options options;

// Larger MemTables
options.write_buffer_size = 256 << 20; // 256MB
options.max_write_buffer_number = 4;

// Less aggressive L0→L1 compaction
options.level0_file_num_compaction_trigger = 8;
options.max_bytes_for_level_base = 1 << 30; // 1GB

// More compaction threads
options.max_background_jobs = 12;

// Consider universal compaction
options.compaction_style = rocksdb::kCompactionStyleUniversal;

Optimize for read performance:

rocksdb::Options options;

// Aggressive compaction to reduce files
options.level0_file_num_compaction_trigger = 2;
options.max_bytes_for_level_multiplier = 8;

// Bloom filters
rocksdb::BlockBasedTableOptions table_opts;
table_opts.filter_policy.reset(rocksdb::NewBloomFilterPolicy(10));
options.table_factory.reset(
    rocksdb::NewBlockBasedTableFactory(table_opts));

// Larger block cache
table_opts.block_cache = rocksdb::NewLRUCache(4 << 30); // 4GB

Minimize space amplification:

rocksdb::Options options;

// Aggressive compaction
options.level0_file_num_compaction_trigger = 2;
options.max_bytes_for_level_multiplier = 8;

// Maximum compression
options.compression = rocksdb::kZSTD;
options.bottommost_compression = rocksdb::kZSTD;
options.bottommost_compression_opts.level = 19;

// Ensure old data is compacted
options.max_bytes_for_level_base = 128 << 20; // Smaller L1

Best Practices

Use level compaction for general-purpose workloads
Monitor L0 file count - too many indicates compaction falling behind
Tune thread count based on CPU cores and I/O capacity
Set appropriate triggers to prevent write stalls
Use compaction filters for application-specific cleanup
Profile compaction with statistics and event listeners
Consider universal compaction for write-heavy workloads
Manual compaction during off-peak hours for large databases

Next Steps

LSM Tree Design

Understand the LSM structure compaction maintains

Architecture

See how compaction fits in RocksDB architecture

Performance Tuning

Advanced compaction optimization techniques

Write-Ahead Log

Learn about WAL file management and compaction

Get Started

Core Concepts

Developer Guide

Advanced Topics

Language Bindings

Tools & Utilities

Why Compaction?

How Compaction Works

Compaction Process

Compaction Styles

Level Compaction (Default)

Universal Compaction

FIFO Compaction

Compaction Priority

Manual Compaction

Compaction Configuration

Background Threads

Write Stall Protection

Compaction Size Limits

Compaction Filters

Monitoring Compaction

Compaction Statistics

Compaction Events

Optimization Strategies

Best Practices

Next Steps

LSM Tree Design

Architecture

Performance Tuning

Write-Ahead Log

Build docs developers (and LLMs) love

Get Started

Core Concepts

Developer Guide

Advanced Topics

Language Bindings

Tools & Utilities

​Why Compaction?

​How Compaction Works

​Compaction Process

​Compaction Styles

​Level Compaction (Default)

​Universal Compaction

​FIFO Compaction

​Compaction Priority

​Manual Compaction

​Compaction Configuration

​Background Threads

​Write Stall Protection

​Compaction Size Limits

​Compaction Filters

​Monitoring Compaction

​Compaction Statistics

​Compaction Events

​Optimization Strategies

​Best Practices

​Next Steps

LSM Tree Design

Architecture

Performance Tuning

Write-Ahead Log

Build docs developers (and LLMs) love

Why Compaction?

How Compaction Works

Compaction Process

Compaction Styles

Level Compaction (Default)

Universal Compaction

FIFO Compaction

Compaction Priority

Manual Compaction

Compaction Configuration

Background Threads

Write Stall Protection

Compaction Size Limits

Compaction Filters

Monitoring Compaction

Compaction Statistics

Compaction Events

Optimization Strategies

Best Practices

Next Steps