Compaction is the background process that reorganizes data in RocksDB’s LSM tree. It merges sorted files from different levels, removes deleted entries, and maintains read performance as data grows.
Why Compaction?
As writes accumulate in RocksDB, several problems emerge without compaction:
Read amplification : More files to check for each read
Space amplification : Multiple versions of the same key
Write amplification : L0 files with overlapping key ranges
Compaction solves these problems by merging files, removing obsolete data, and maintaining the LSM tree invariants.
How Compaction Works
Compaction merges SST files from one level to the next:
#include <rocksdb/db.h>
rocksdb ::DB * db;
rocksdb ::Options options;
options . create_if_missing = true ;
// Configure compaction triggers
options . level0_file_num_compaction_trigger = 4 ; // Compact when 4 L0 files
options . max_bytes_for_level_base = 256 << 20 ; // 256MB for L1
options . max_bytes_for_level_multiplier = 10 ; // Each level 10x larger
rocksdb :: DB :: Open (options, "/tmp/testdb" , & db);
Compaction Process
L0 → L1
Li → Li+1
Data Cleanup
L0 to L1 compaction merges overlapping L0 files with L1: // L0 files may overlap
// L0: [a-m], [b-p], [c-z]
// L1: [a-h], [i-p], [q-z]
// Compaction merges all overlapping files
// Result: New L1 files with all data merged and sorted
L0→L1 compaction is the most expensive because L0 files can overlap with multiple L1 files.
Lower level compactions merge one file with overlapping files in the next level: // Select one L1 file: [i-p]
// Find overlapping L2 files: [i-m], [n-s]
// Merge selected files
// Output: New L2 files with merged, sorted data
// Each key range compacted independently
During compaction, obsolete data is removed:
Deleted keys : Remove tombstones and deleted data
Old versions : Keep only the latest version of each key
Expired data : Apply TTL and compaction filters
Compression : Recompress with different algorithms
Compaction Styles
RocksDB supports multiple compaction strategies:
Level Compaction (Default)
Data is organized into levels of increasing size:
rocksdb ::Options options;
options . compaction_style = rocksdb ::kCompactionStyleLevel;
// Configure level sizes
options . level0_file_num_compaction_trigger = 4 ;
options . max_bytes_for_level_base = 256 << 20 ; // 256MB
options . max_bytes_for_level_multiplier = 10 ; // 10x growth
// Typical level sizes:
// L0: 4 files × 64MB = 256MB
// L1: 256MB
// L2: 2.56GB
// L3: 25.6GB
// L4: 256GB
Level Compaction Characteristics
Pros :
Lower space amplification (1.1-1.3x)
Predictable read performance
Works well for most workloads
Cons :
Higher write amplification (10-30x)
More background I/O
Best for : General-purpose workloads, read-heavy applications
Universal Compaction
Optimizes for write amplification:
#include <rocksdb/universal_compaction.h>
rocksdb ::Options options;
options . compaction_style = rocksdb ::kCompactionStyleUniversal;
// Configure universal compaction
rocksdb ::CompactionOptionsUniversal universal_opts;
universal_opts . size_ratio = 1 ; // 1% size difference
universal_opts . min_merge_width = 2 ; // Min files to merge
universal_opts . max_merge_width = UINT_MAX; // Max files to merge
universal_opts . max_size_amplification_percent = 200 ; // 200% amplification
options . compaction_options_universal = universal_opts;
Universal Compaction Characteristics
Pros :
Lower write amplification (2-5x)
Higher write throughput
Simpler model (no levels)
Cons :
Higher space amplification (1.5-2x)
Variable read performance
Large compactions can cause latency spikes
Best for : Write-heavy workloads, time-series data
FIFO Compaction
Simple time-based deletion:
#include <rocksdb/advanced_options.h>
rocksdb ::Options options;
options . compaction_style = rocksdb ::kCompactionStyleFIFO;
// Configure FIFO
rocksdb ::CompactionOptionsFIFO fifo_opts;
fifo_opts . max_table_files_size = 10 ULL << 30 ; // 10GB total
fifo_opts . allow_compaction = false ; // No merging
options . compaction_options_fifo = fifo_opts;
FIFO Compaction Characteristics
Pros :
Minimal write amplification (1x)
No compaction overhead
Simple space management
Cons :
No key updates or deletes (append-only)
High space amplification
Oldest data deleted when limit reached
Best for : Append-only logs, circular buffers, time-series with TTL
Compaction Priority
Control which files are compacted first:
#include <rocksdb/advanced_options.h>
rocksdb ::ColumnFamilyOptions cf_options;
// Minimize overlapping ratio (recommended)
cf_options . compaction_pri = rocksdb ::kMinOverlappingRatio;
// Other options:
// cf_options.compaction_pri = rocksdb::kOldestLargestSeqFirst;
// cf_options.compaction_pri = rocksdb::kOldestSmallestSeqFirst;
// cf_options.compaction_pri = rocksdb::kRoundRobin;
Manual Compaction
Trigger compaction programmatically:
// Compact entire database
rocksdb ::CompactRangeOptions compact_opts;
db -> CompactRange (compact_opts, nullptr , nullptr );
// Compact specific key range
rocksdb :: Slice start ( "user:1000" );
rocksdb :: Slice end ( "user:2000" );
db -> CompactRange (compact_opts, & start, & end);
// Compact specific column family
db -> CompactRange (compact_opts, column_family_handle, nullptr , nullptr );
Manual compaction blocks until complete and can take significant time for large databases. Use CompactRangeOptions::exclusive_manual_compaction = false to allow parallel automatic compactions.
Compaction Configuration
Background Threads
rocksdb ::Options options;
// Modern unified thread pool (recommended)
options . max_background_jobs = 8 ; // Total threads for flush + compaction
// Legacy separate pools
// options.max_background_compactions = 6;
// options.max_background_flushes = 2;
// Increase parallelism
options . IncreaseParallelism ( 16 ); // Helper to configure threads
Write Stall Protection
Prevent writes from overwhelming compaction:
rocksdb ::Options options;
// Soft limit: slow down writes
options . level0_slowdown_writes_trigger = 20 ;
// Hard limit: stop writes
options . level0_stop_writes_trigger = 36 ;
// Soft limit for pending compaction bytes
options . soft_pending_compaction_bytes_limit = 64 ULL << 30 ; // 64GB
// Hard limit for pending compaction bytes
options . hard_pending_compaction_bytes_limit = 256 ULL << 30 ; // 256GB
When limits are exceeded, RocksDB slows or stops writes to allow compaction to catch up. Monitor these triggers to avoid write stalls.
Compaction Size Limits
rocksdb ::Options options;
// Maximum compaction size
options . max_compaction_bytes = 1 ULL << 30 ; // 1GB
// Split large compactions into smaller jobs
// Reduces latency spikes and temp space usage
Compaction Filters
Custom logic to drop keys during compaction:
#include <rocksdb/compaction_filter.h>
class TTLCompactionFilter : public rocksdb :: CompactionFilter {
public:
bool Filter ( int level , const rocksdb :: Slice & key ,
const rocksdb :: Slice & value ,
std :: string * new_value ,
bool* value_changed ) const override {
// Parse timestamp from value
int64_t timestamp = /* extract from value */ ;
int64_t now = /* current time */ ;
// Drop expired keys
if (now - timestamp > 86400 ) { // 24 hours
return true ; // Remove this key
}
return false ; // Keep this key
}
const char* Name () const override { return "TTLCompactionFilter" ; }
};
// Use the filter
rocksdb ::Options options;
options . compaction_filter = new TTLCompactionFilter ();
Compaction filters run on every key during compaction. Keep logic simple to avoid slowing compaction.
Monitoring Compaction
Compaction Statistics
#include <rocksdb/statistics.h>
rocksdb ::Options options;
options . statistics = rocksdb :: CreateDBStatistics ();
rocksdb :: DB :: Open (options, "/tmp/testdb" , & db);
// Get compaction stats
std ::string stats;
db -> GetProperty ( "rocksdb.stats" , & stats);
std ::cout << stats << std ::endl;
// Specific compaction metrics
db -> GetProperty ( "rocksdb.num-files-at-level0" , & stats);
db -> GetProperty ( "rocksdb.num-files-at-level1" , & stats);
db -> GetProperty ( "rocksdb.compaction-pending" , & stats);
Compaction Events
#include <rocksdb/listener.h>
class MyEventListener : public rocksdb :: EventListener {
public:
void OnCompactionBegin ( rocksdb :: DB * db ,
const rocksdb :: CompactionJobInfo & info ) override {
std ::cout << "Compaction started: "
<< "Level " << info . base_input_level
<< " -> " << info . output_level << std ::endl;
}
void OnCompactionCompleted ( rocksdb :: DB * db ,
const rocksdb :: CompactionJobInfo & info ) override {
std ::cout << "Compaction completed: "
<< info . input_files . size () << " files merged, "
<< info . total_input_bytes << " bytes in, "
<< info . total_output_bytes << " bytes out" << std ::endl;
}
};
rocksdb ::Options options;
options . listeners . emplace_back ( new MyEventListener ());
Optimization Strategies
Write-Heavy
Read-Heavy
Space-Constrained
Optimize for write throughput: rocksdb ::Options options;
// Larger MemTables
options . write_buffer_size = 256 << 20 ; // 256MB
options . max_write_buffer_number = 4 ;
// Less aggressive L0→L1 compaction
options . level0_file_num_compaction_trigger = 8 ;
options . max_bytes_for_level_base = 1 << 30 ; // 1GB
// More compaction threads
options . max_background_jobs = 12 ;
// Consider universal compaction
options . compaction_style = rocksdb ::kCompactionStyleUniversal;
Optimize for read performance: rocksdb ::Options options;
// Aggressive compaction to reduce files
options . level0_file_num_compaction_trigger = 2 ;
options . max_bytes_for_level_multiplier = 8 ;
// Bloom filters
rocksdb ::BlockBasedTableOptions table_opts;
table_opts . filter_policy . reset ( rocksdb :: NewBloomFilterPolicy ( 10 ));
options . table_factory . reset (
rocksdb :: NewBlockBasedTableFactory (table_opts));
// Larger block cache
table_opts . block_cache = rocksdb :: NewLRUCache ( 4 << 30 ); // 4GB
Minimize space amplification: rocksdb ::Options options;
// Aggressive compaction
options . level0_file_num_compaction_trigger = 2 ;
options . max_bytes_for_level_multiplier = 8 ;
// Maximum compression
options . compression = rocksdb ::kZSTD;
options . bottommost_compression = rocksdb ::kZSTD;
options . bottommost_compression_opts . level = 19 ;
// Ensure old data is compacted
options . max_bytes_for_level_base = 128 << 20 ; // Smaller L1
Best Practices
Use level compaction for general-purpose workloads
Monitor L0 file count - too many indicates compaction falling behind
Tune thread count based on CPU cores and I/O capacity
Set appropriate triggers to prevent write stalls
Use compaction filters for application-specific cleanup
Profile compaction with statistics and event listeners
Consider universal compaction for write-heavy workloads
Manual compaction during off-peak hours for large databases
Next Steps
LSM Tree Design Understand the LSM structure compaction maintains
Architecture See how compaction fits in RocksDB architecture
Performance Tuning Advanced compaction optimization techniques
Write-Ahead Log Learn about WAL file management and compaction