Overview
RocksDB provides two main option types: DBOptions for database-level settings and ColumnFamilyOptions for column family-level settings. The Options struct combines both.
DBOptions
Database-level options that affect the entire DB instance.
File and Directory Options
create_if_missing
bool create_if_missing = false;
If true, the database will be created if it is missing.
Create the database directory if it doesn’t exist
create_missing_column_families
bool create_missing_column_families = false;
If true, missing column families will be automatically created on DB::Open().
error_if_exists
bool error_if_exists = false;
If true, an error is raised if the database already exists.
db_paths
std::vector<DbPath> db_paths;
A list of paths where SST files can be placed. Newer data goes to earlier paths, older data gradually moves to later paths.
DBOptions options;
options.db_paths.push_back(DbPath("/flash_path", 10 * 1024 * 1024 * 1024ULL)); // 10GB
options.db_paths.push_back(DbPath("/hard_drive", 2 * 1024 * 1024 * 1024 * 1024ULL)); // 2TB
db_log_dir
std::string db_log_dir = "";
Directory for info LOG files. If empty, LOG files will be in the same directory as data.
This is NOT for WALs (Write-Ahead Logs).
wal_dir
std::string wal_dir = "";
Absolute directory path for write-ahead logs (WAL). If empty, WAL files will be in the same directory as data.
max_open_files
Number of open files that can be used by the DB. Value -1 means files opened are always kept open.
A high value or -1 can cause high memory usage. See BlockBasedTableOptions::cache_usage_options to constrain memory.
Dynamically changeable through SetDBOptions() API.
max_file_opening_threads
int max_file_opening_threads = 16;
Number of threads used to open files when max_open_files is -1.
max_background_jobs
int max_background_jobs = 2;
Maximum number of concurrent background jobs (compactions and flushes)
Dynamically changeable through SetDBOptions() API.
max_subcompactions
uint32_t max_subcompactions = 1;
Maximum number of threads that will concurrently perform a compaction job by breaking it into smaller ones.
Dynamically changeable through SetDBOptions() API.
Write-Ahead Log Options
max_total_wal_size
uint64_t max_total_wal_size = 0;
Maximum total size of WAL files. When exceeded, forces flush of column families backed by oldest WAL. If 0, dynamically set to [sum of all write_buffer_size * max_write_buffer_number] * 4.
Dynamically changeable through SetDBOptions() API.
WAL_ttl_seconds
uint64_t WAL_ttl_seconds = 0;
Time-to-live for archived WAL files in seconds. WALs older than this will be deleted.
WAL_size_limit_MB
uint64_t WAL_size_limit_MB = 0;
Size limit for archived WAL files in megabytes. When exceeded, oldest archived WALs are deleted.
recycle_log_file_num
size_t recycle_log_file_num = 0;
Number of WAL files to keep around for reuse (more efficient than allocating new files).
Validation and Checking
paranoid_checks
bool paranoid_checks = true;
Enable pro-active checks for DB or data corruption. When true, DB enters read-only mode on write failure.
track_and_verify_wals_in_manifest
bool track_and_verify_wals_in_manifest = false;
Track WAL log numbers and sizes in MANIFEST. During recovery, error is reported if synced WAL is missing or size doesn’t match.
verify_sst_unique_id_in_manifest
bool verify_sst_unique_id_in_manifest = true;
Verify SST unique ID between MANIFEST and actual file when opening SST files. Ensures files are not overwritten or misplaced.
Environment and I/O
env
Env* env = Env::Default();
Environment object for file system operations, scheduling background work, etc.
use_fsync
Use fsync instead of fdatasync for writes to stable storage. Both are equally safe; fdatasync is faster.
use_direct_reads
bool use_direct_reads = false;
Use O_DIRECT for user and compaction reads (bypasses OS cache).
use_direct_io_for_flush_and_compaction
bool use_direct_io_for_flush_and_compaction = false;
Use O_DIRECT for background flush and compaction writes.
allow_mmap_reads
bool allow_mmap_reads = false;
Allow OS to mmap files for reading SST tables. Not recommended for 32-bit OS.
allow_mmap_writes
bool allow_mmap_writes = false;
DB::SyncWAL() only works if this is set to false.
allow_fallocate
bool allow_fallocate = true;
Enable file preallocation for WAL, SST, and Manifest files.
On btrfs, set to false to disable preallocation as extra allocated space cannot be freed.
Statistics and Monitoring
statistics
std::shared_ptr<Statistics> statistics = nullptr;
Statistics object for collecting metrics about database operations.
stats_dump_period_sec
unsigned int stats_dump_period_sec = 600;
stats_dump_period_sec
unsigned int
default:"600"
If non-zero, dump rocksdb.stats to LOG every N seconds (default: 10 minutes)
Dynamically changeable through SetDBOptions() API.
stats_persist_period_sec
unsigned int stats_persist_period_sec = 600;
If non-zero, persist stats every N seconds.
persist_stats_to_disk
bool persist_stats_to_disk = false;
Automatically persist stats to a hidden column family every stats_persist_period_sec seconds.
Rate Limiting
rate_limiter
std::shared_ptr<RateLimiter> rate_limiter = nullptr;
rate_limiter
std::shared_ptr<RateLimiter>
default:"nullptr"
Limits internal file read/write bandwidth for flush and compaction operations.
sst_file_manager
std::shared_ptr<SstFileManager> sst_file_manager = nullptr;
Track SST files and control their deletion rate. Features include:
- Throttle deletion rate
- Track total size of SST files
- Set maximum allowed space limit
- Can be shared between multiple DBs
Manifest Options
max_manifest_file_size
uint64_t max_manifest_file_size = 1024 * 1024 * 1024;
Manifest file is rolled over when reaching this limit AND the space amp limit. Used as minimum for auto-tuned max manifest size.
Dynamically changeable through SetDBOptions() API.
max_manifest_space_amp_pct
int max_manifest_space_amp_pct = 500;
max_manifest_space_amp_pct
Controls auto-tuned balance of manifest write and space amplification. New manifest created when current size exceeds max(max_manifest_file_size, est_compacted_size * (1 + pct/100)).
Values guide:
- 0: Every write generates new manifest (testing only)
- 100: ~1.0 write amp, ~1.0 space amp
- 500: 0.2 write amp, ~5.0 space amp (recommended)
- 10000: 0.01 write amp, ~100 space amp
Dynamically changeable through SetDBOptions() API.
ColumnFamilyOptions
Column family-level options that can be set independently for each column family.
Basic Options
comparator
const Comparator* comparator = BytewiseComparator();
comparator
const Comparator*
default:"BytewiseComparator()"
Defines the order of keys in the table. Must be the same as used in previous opens of the same DB.
merge_operator
std::shared_ptr<MergeOperator> merge_operator = nullptr;
Merge operator for Merge() operations. Required if Merge operation will be used.
Memtable Options
write_buffer_size
size_t write_buffer_size = 64 << 20;
Amount of data to build up in memory before converting to sorted on-disk file. Larger values increase performance, especially during bulk loads.
Dynamically changeable through SetOptions() API.
max_write_buffer_number
int max_write_buffer_number = 2;
Maximum number of write buffers built up in memory. Writing will slow down if we’re writing to the last allowed buffer.
Dynamically changeable through SetOptions() API.
min_write_buffer_number_to_merge
int min_write_buffer_number_to_merge = 1;
Minimum number of write buffers that will be merged together before writing to storage.
Compression
compression
CompressionType compression;
compression
CompressionType
default:"kSnappyCompression"
Compression algorithm for blocks. Default is kSnappyCompression if linked, otherwise kNoCompression.
Dynamically changeable through SetOptions() API.
Typical Snappy speeds on Intel Core 2 2.4GHz:
- Compression: ~200-500MB/s
- Decompression: ~400-800MB/s
bottommost_compression
CompressionType bottommost_compression = kDisableCompressionOption;
Compression algorithm for bottommost level files. Can use slower but more effective compression.
compression_per_level
std::vector<CompressionType> compression_per_level;
Different compression policies for different levels. Overrides compression setting.
Dynamically changeable through SetOptions() API.
Compaction Options
compaction_style
CompactionStyle compaction_style = kCompactionStyleLevel;
compaction_style
CompactionStyle
default:"kCompactionStyleLevel"
Compaction style: Level-based, Universal, FIFO, or None
compaction_pri
CompactionPri compaction_pri = kMinOverlappingRatio;
Determines which files to pick for compaction in level-based compaction.
level0_file_num_compaction_trigger
int level0_file_num_compaction_trigger = 4;
level0_file_num_compaction_trigger
Number of files to trigger level-0 compaction. Value less than 0 disables level-0 compaction by file count.
Dynamically changeable through SetOptions() API.
level0_slowdown_writes_trigger
int level0_slowdown_writes_trigger = 20;
Soft limit on level-0 files. Writes slow down at this point.
Dynamically changeable through SetOptions() API.
level0_stop_writes_trigger
int level0_stop_writes_trigger = 36;
Maximum number of level-0 files. Writes stop at this point.
Dynamically changeable through SetOptions() API.
disable_auto_compactions
bool disable_auto_compactions = false;
Disable automatic compactions. Manual compactions can still be issued.
Dynamically changeable through SetOptions() API.
Level Size Options
max_bytes_for_level_base
uint64_t max_bytes_for_level_base = 256 * 1048576;
Maximum total data size for level-1. Maximum for level L is calculated as max_bytes_for_level_base * (max_bytes_for_level_multiplier ^ (L-1)).
Dynamically changeable through SetOptions() API.
max_bytes_for_level_multiplier
double max_bytes_for_level_multiplier = 10;
Multiplier for level sizes (default: each level is 10x the size of the previous level).
Dynamically changeable through SetOptions() API.
level_compaction_dynamic_level_bytes
bool level_compaction_dynamic_level_bytes = true;
level_compaction_dynamic_level_bytes
Pick target size of each level dynamically. Provides more predictable LSM tree shape and better space amplification.
File Size Options
target_file_size_base
uint64_t target_file_size_base = 64 * 1048576;
Target file size for compaction (level-1). Size for level L is target_file_size_base * (target_file_size_multiplier ^ (L-1)).
Dynamically changeable through SetOptions() API.
target_file_size_multiplier
int target_file_size_multiplier = 1;
By default 1, meaning files in different levels have similar size.
Dynamically changeable through SetOptions() API.
Table Factory
table_factory
std::shared_ptr<TableFactory> table_factory;
Factory for creating table files. Default is block-based table factory.
Prefix and Filtering
std::shared_ptr<const SliceTransform> prefix_extractor = nullptr;
Function to put keys in contiguous groups (prefixes) for Bloom filter optimization.
TTL and Compaction Timing
ttl
uint64_t ttl = 0xfffffffffffffffe;
ttl
uint64_t
default:"30 days"
Time-to-live in seconds. Files with all keys older than TTL will be compacted. 0 means disabled, 0xfffffffffffffffe means use default (30 days for block-based table).
Dynamically changeable through SetOptions() API.
periodic_compaction_seconds
uint64_t periodic_compaction_seconds = 0xfffffffffffffffe;
Files older than this will be picked for compaction and re-written. Default: 30 days for block-based table with compaction filter or universal compaction.
Dynamically changeable through SetOptions() API.
Options Struct
Combines DBOptions and ColumnFamilyOptions for convenience.
struct Options : public DBOptions, public ColumnFamilyOptions {
Options();
Options(const DBOptions& db_options,
const ColumnFamilyOptions& column_family_options);
};
Options options;
options.create_if_missing = true;
options.write_buffer_size = 128 << 20; // 128MB
options.max_open_files = 1000;
options.compression = kSnappyCompression;
DB* db;
Status s = DB::Open(options, "/path/to/db", &db);
Helper Methods
OptimizeForSmallDb
Options* OptimizeForSmallDb(std::shared_ptr<Cache>* cache = nullptr);
Optimize settings for databases under 1GB that don’t want to spend lots of memory.
OptimizeForPointLookup
ColumnFamilyOptions* OptimizeForPointLookup(uint64_t block_cache_size_mb);
Use when data doesn’t need to be sorted (only Put/Get, no iterators).
OptimizeLevelStyleCompaction
ColumnFamilyOptions* OptimizeLevelStyleCompaction(
uint64_t memtable_memory_budget = 512 * 1024 * 1024);
Optimize for level-style compaction with heavy workloads.
OptimizeUniversalStyleCompaction
ColumnFamilyOptions* OptimizeUniversalStyleCompaction(
uint64_t memtable_memory_budget = 512 * 1024 * 1024);
Optimize for universal compaction style (reduces write amplification, increases space amplification).
IncreaseParallelism
DBOptions* IncreaseParallelism(int total_threads = 16);
Total number of background threads for flush and compaction. Good value is number of CPU cores.
You almost definitely want to call this if your system is bottlenecked by RocksDB.