Skip to main content
Compaction is the background process that reorganizes data in RocksDB’s LSM tree. It merges sorted files from different levels, removes deleted entries, and maintains read performance as data grows.

Why Compaction?

As writes accumulate in RocksDB, several problems emerge without compaction:
  • Read amplification: More files to check for each read
  • Space amplification: Multiple versions of the same key
  • Write amplification: L0 files with overlapping key ranges
Compaction solves these problems by merging files, removing obsolete data, and maintaining the LSM tree invariants.

How Compaction Works

Compaction merges SST files from one level to the next:
#include <rocksdb/db.h>

rocksdb::DB* db;
rocksdb::Options options;
options.create_if_missing = true;

// Configure compaction triggers
options.level0_file_num_compaction_trigger = 4;  // Compact when 4 L0 files
options.max_bytes_for_level_base = 256 << 20;    // 256MB for L1
options.max_bytes_for_level_multiplier = 10;     // Each level 10x larger

rocksdb::DB::Open(options, "/tmp/testdb", &db);

Compaction Process

L0 to L1 compaction merges overlapping L0 files with L1:
// L0 files may overlap
// L0: [a-m], [b-p], [c-z]
// L1: [a-h], [i-p], [q-z]

// Compaction merges all overlapping files
// Result: New L1 files with all data merged and sorted
L0→L1 compaction is the most expensive because L0 files can overlap with multiple L1 files.

Compaction Styles

RocksDB supports multiple compaction strategies:

Level Compaction (Default)

Data is organized into levels of increasing size:
rocksdb::Options options;
options.compaction_style = rocksdb::kCompactionStyleLevel;

// Configure level sizes
options.level0_file_num_compaction_trigger = 4;
options.max_bytes_for_level_base = 256 << 20;    // 256MB
options.max_bytes_for_level_multiplier = 10;     // 10x growth

// Typical level sizes:
// L0: 4 files × 64MB = 256MB
// L1: 256MB
// L2: 2.56GB
// L3: 25.6GB
// L4: 256GB
Pros:
  • Lower space amplification (1.1-1.3x)
  • Predictable read performance
  • Works well for most workloads
Cons:
  • Higher write amplification (10-30x)
  • More background I/O
Best for: General-purpose workloads, read-heavy applications

Universal Compaction

Optimizes for write amplification:
#include <rocksdb/universal_compaction.h>

rocksdb::Options options;
options.compaction_style = rocksdb::kCompactionStyleUniversal;

// Configure universal compaction
rocksdb::CompactionOptionsUniversal universal_opts;
universal_opts.size_ratio = 1;                      // 1% size difference
universal_opts.min_merge_width = 2;                 // Min files to merge
universal_opts.max_merge_width = UINT_MAX;          // Max files to merge
universal_opts.max_size_amplification_percent = 200; // 200% amplification

options.compaction_options_universal = universal_opts;
Pros:
  • Lower write amplification (2-5x)
  • Higher write throughput
  • Simpler model (no levels)
Cons:
  • Higher space amplification (1.5-2x)
  • Variable read performance
  • Large compactions can cause latency spikes
Best for: Write-heavy workloads, time-series data

FIFO Compaction

Simple time-based deletion:
#include <rocksdb/advanced_options.h>

rocksdb::Options options;
options.compaction_style = rocksdb::kCompactionStyleFIFO;

// Configure FIFO
rocksdb::CompactionOptionsFIFO fifo_opts;
fifo_opts.max_table_files_size = 10ULL << 30; // 10GB total
fifo_opts.allow_compaction = false;           // No merging

options.compaction_options_fifo = fifo_opts;
Pros:
  • Minimal write amplification (1x)
  • No compaction overhead
  • Simple space management
Cons:
  • No key updates or deletes (append-only)
  • High space amplification
  • Oldest data deleted when limit reached
Best for: Append-only logs, circular buffers, time-series with TTL

Compaction Priority

Control which files are compacted first:
#include <rocksdb/advanced_options.h>

rocksdb::ColumnFamilyOptions cf_options;

// Minimize overlapping ratio (recommended)
cf_options.compaction_pri = rocksdb::kMinOverlappingRatio;

// Other options:
// cf_options.compaction_pri = rocksdb::kOldestLargestSeqFirst;
// cf_options.compaction_pri = rocksdb::kOldestSmallestSeqFirst;
// cf_options.compaction_pri = rocksdb::kRoundRobin;

Manual Compaction

Trigger compaction programmatically:
// Compact entire database
rocksdb::CompactRangeOptions compact_opts;
db->CompactRange(compact_opts, nullptr, nullptr);

// Compact specific key range
rocksdb::Slice start("user:1000");
rocksdb::Slice end("user:2000");
db->CompactRange(compact_opts, &start, &end);

// Compact specific column family
db->CompactRange(compact_opts, column_family_handle, nullptr, nullptr);
Manual compaction blocks until complete and can take significant time for large databases. Use CompactRangeOptions::exclusive_manual_compaction = false to allow parallel automatic compactions.

Compaction Configuration

Background Threads

rocksdb::Options options;

// Modern unified thread pool (recommended)
options.max_background_jobs = 8; // Total threads for flush + compaction

// Legacy separate pools
// options.max_background_compactions = 6;
// options.max_background_flushes = 2;

// Increase parallelism
options.IncreaseParallelism(16); // Helper to configure threads

Write Stall Protection

Prevent writes from overwhelming compaction:
rocksdb::Options options;

// Soft limit: slow down writes
options.level0_slowdown_writes_trigger = 20;

// Hard limit: stop writes
options.level0_stop_writes_trigger = 36;

// Soft limit for pending compaction bytes
options.soft_pending_compaction_bytes_limit = 64ULL << 30; // 64GB

// Hard limit for pending compaction bytes
options.hard_pending_compaction_bytes_limit = 256ULL << 30; // 256GB
When limits are exceeded, RocksDB slows or stops writes to allow compaction to catch up. Monitor these triggers to avoid write stalls.

Compaction Size Limits

rocksdb::Options options;

// Maximum compaction size
options.max_compaction_bytes = 1ULL << 30; // 1GB

// Split large compactions into smaller jobs
// Reduces latency spikes and temp space usage

Compaction Filters

Custom logic to drop keys during compaction:
#include <rocksdb/compaction_filter.h>

class TTLCompactionFilter : public rocksdb::CompactionFilter {
 public:
  bool Filter(int level, const rocksdb::Slice& key,
              const rocksdb::Slice& value,
              std::string* new_value,
              bool* value_changed) const override {
    // Parse timestamp from value
    int64_t timestamp = /* extract from value */;
    int64_t now = /* current time */;
    
    // Drop expired keys
    if (now - timestamp > 86400) { // 24 hours
      return true; // Remove this key
    }
    
    return false; // Keep this key
  }
  
  const char* Name() const override { return "TTLCompactionFilter"; }
};

// Use the filter
rocksdb::Options options;
options.compaction_filter = new TTLCompactionFilter();
Compaction filters run on every key during compaction. Keep logic simple to avoid slowing compaction.

Monitoring Compaction

Compaction Statistics

#include <rocksdb/statistics.h>

rocksdb::Options options;
options.statistics = rocksdb::CreateDBStatistics();

rocksdb::DB::Open(options, "/tmp/testdb", &db);

// Get compaction stats
std::string stats;
db->GetProperty("rocksdb.stats", &stats);
std::cout << stats << std::endl;

// Specific compaction metrics
db->GetProperty("rocksdb.num-files-at-level0", &stats);
db->GetProperty("rocksdb.num-files-at-level1", &stats);
db->GetProperty("rocksdb.compaction-pending", &stats);

Compaction Events

#include <rocksdb/listener.h>

class MyEventListener : public rocksdb::EventListener {
 public:
  void OnCompactionBegin(rocksdb::DB* db,
                         const rocksdb::CompactionJobInfo& info) override {
    std::cout << "Compaction started: "
              << "Level " << info.base_input_level 
              << " -> " << info.output_level << std::endl;
  }
  
  void OnCompactionCompleted(rocksdb::DB* db,
                            const rocksdb::CompactionJobInfo& info) override {
    std::cout << "Compaction completed: "
              << info.input_files.size() << " files merged, "
              << info.total_input_bytes << " bytes in, "
              << info.total_output_bytes << " bytes out" << std::endl;
  }
};

rocksdb::Options options;
options.listeners.emplace_back(new MyEventListener());

Optimization Strategies

Optimize for write throughput:
rocksdb::Options options;

// Larger MemTables
options.write_buffer_size = 256 << 20; // 256MB
options.max_write_buffer_number = 4;

// Less aggressive L0→L1 compaction
options.level0_file_num_compaction_trigger = 8;
options.max_bytes_for_level_base = 1 << 30; // 1GB

// More compaction threads
options.max_background_jobs = 12;

// Consider universal compaction
options.compaction_style = rocksdb::kCompactionStyleUniversal;

Best Practices

  1. Use level compaction for general-purpose workloads
  2. Monitor L0 file count - too many indicates compaction falling behind
  3. Tune thread count based on CPU cores and I/O capacity
  4. Set appropriate triggers to prevent write stalls
  5. Use compaction filters for application-specific cleanup
  6. Profile compaction with statistics and event listeners
  7. Consider universal compaction for write-heavy workloads
  8. Manual compaction during off-peak hours for large databases

Next Steps

LSM Tree Design

Understand the LSM structure compaction maintains

Architecture

See how compaction fits in RocksDB architecture

Performance Tuning

Advanced compaction optimization techniques

Write-Ahead Log

Learn about WAL file management and compaction

Build docs developers (and LLMs) love