Write Stall Handling

Overview

Write stalls occur when RocksDB slows down or temporarily stops accepting writes to prevent running out of resources. This is a critical flow control mechanism that protects database stability.

Write stalls indicate that writes are arriving faster than compaction can process them. While stalls protect the database, they impact write throughput and latency.

Why Write Stalls Happen

From advanced_options.h:539-716, RocksDB triggers write stalls when:

Too many L0 files - Exceeds level0_slowdown_writes_trigger or level0_stop_writes_trigger
Pending compaction bytes - Exceeds soft_pending_compaction_bytes_limit or hard_pending_compaction_bytes_limit
Too many memtables - Immutable memtables waiting for flush

Stall Conditions

Level 0 File Thresholds

From advanced_options.h:539-553:

// Soft limit - starts slowing down writes
int level0_slowdown_writes_trigger = 20;  // Default

// Hard limit - stops writes completely
int level0_stop_writes_trigger = 36;      // Default

How it works:

Options options;
options.level0_slowdown_writes_trigger = 20;
options.level0_stop_writes_trigger = 36;

// L0 files: 0-19   → Normal write speed
// L0 files: 20-35  → Writes slowed (delayed_write_rate)
// L0 files: 36+    → Writes stopped until compaction catches up

Dynamically changeable through SetOptions() API.

Pending Compaction Bytes

From advanced_options.h:702-716:

// Soft limit - slow down writes
uint64_t soft_pending_compaction_bytes_limit = 64 * 1024 * 1024 * 1024ULL;  // 64 GB

// Hard limit - stop writes
uint64_t hard_pending_compaction_bytes_limit = 256 * 1024 * 1024 * 1024ULL; // 256 GB

Purpose: Prevents unbounded accumulation of data awaiting compaction.

Options options;
options.soft_pending_compaction_bytes_limit = 64ULL * 1024 * 1024 * 1024;
options.hard_pending_compaction_bytes_limit = 256ULL * 1024 * 1024 * 1024;

// Estimated pending bytes < 64 GB   → Normal
// Estimated pending bytes 64-256 GB → Delayed writes
// Estimated pending bytes > 256 GB  → Writes stopped

Memtable Limits

From advanced_options.h:259-270:

int max_write_buffer_number = 2;  // Maximum write buffers

Write stalls when:

Active memtable is full
All max_write_buffer_number slots are occupied
Flush can’t keep up with write rate

Write Delay Rate

When slowdown is triggered, writes are delayed to this rate:

DBOptions db_options;
db_options.delayed_write_rate = 16 * 1024 * 1024;  // 16 MB/s (default)

RocksDB may automatically adjust delayed_write_rate based on flush/compaction speed to find the optimal rate.

Monitoring Write Stalls

Statistics

From statistics.h:205:

STALL_MICROS  // Total time writes were stalled (in microseconds)

Check if stalls are occurring:

auto stats = options.statistics;
uint64_t stall_micros = stats->getTickerCount(STALL_MICROS);

if (stall_micros > 0) {
  double stall_seconds = stall_micros / 1000000.0;
  printf("Writes stalled for %.2f seconds\n", stall_seconds);
}

Histogram

From statistics.h:622:

WRITE_STALL  // Distribution of stall durations

HistogramData stall_hist;
stats->histogramData(WRITE_STALL, &stall_hist);

printf("Stall Duration:\n");
printf("  Median: %.2f us\n", stall_hist.median);
printf("  P95:    %.2f us\n", stall_hist.percentile95);
printf("  P99:    %.2f us\n", stall_hist.percentile99);
printf("  Max:    %.2f us\n", stall_hist.max);

Database Properties

// Check pending compaction bytes
uint64_t pending_bytes;
db->GetIntProperty(
    "rocksdb.estimate-pending-compaction-bytes",
    &pending_bytes
);

// Check L0 file count
uint64_t l0_files;
db->GetIntProperty("rocksdb.num-files-at-level0", &l0_files);

// Check memtable flush pending
std::string flush_pending;
db->GetProperty("rocksdb.mem-table-flush-pending", &flush_pending);

printf("Pending compaction: %lu bytes\n", pending_bytes);
printf("L0 files: %lu\n", l0_files);
printf("Flush pending: %s\n", flush_pending.c_str());

Preventing Write Stalls

1. Increase L0 Thresholds

Options options;
// More relaxed thresholds (requires more compaction resources)
options.level0_slowdown_writes_trigger = 30;  // Was 20
options.level0_stop_writes_trigger = 50;      // Was 36

Increasing thresholds allows more L0 files, which can slow down reads and require more compaction resources.

2. Increase Write Buffers

From advanced_options.h:259-282:

Options options;
options.write_buffer_size = 128 * 1024 * 1024;  // 128 MB (was 64 MB)
options.max_write_buffer_number = 4;            // More buffers
options.min_write_buffer_number_to_merge = 2;

Larger buffers mean:

Fewer L0 files created
More time between flushes
Higher memory usage

3. Increase Compaction Resources

DBOptions db_options;
db_options.max_background_jobs = 8;  // More threads for flush & compaction

// Or separately
db_options.max_background_compactions = 6;
db_options.max_background_flushes = 2;

4. Adjust Compaction Limits

Options options;
options.soft_pending_compaction_bytes_limit = 128ULL * 1024 * 1024 * 1024;  // 128 GB
options.hard_pending_compaction_bytes_limit = 512ULL * 1024 * 1024 * 1024;  // 512 GB

5. Optimize Compaction Speed

// Use faster compression for hot levels
options.compression_per_level = {
  kNoCompression,  // L0 - no compression overhead
  kLZ4Compression, // L1 - fast compression
  kLZ4Compression, // L2
  kZSTD,           // L3+ - balanced
};

// Reduce compaction work per file
options.target_file_size_base = 128 * 1024 * 1024;  // Larger files, fewer of them

6. Enable Dynamic Level Bytes

From advanced_options.h:581-665:

options.level_compaction_dynamic_level_bytes = true;  // Default in newer versions

Dynamic level bytes:

Adapts to write traffic
Reduces write amplification
More predictable LSM shape

Handling Stalls in Applications

Detect and Retry

Status PutWithRetry(DB* db, const Slice& key, const Slice& value) {
  WriteOptions write_opts;
  int retry_count = 0;
  
  while (retry_count < 3) {
    Status s = db->Put(write_opts, key, value);
    
    if (s.ok()) {
      return s;
    }
    
    if (s.IsTimedOut() || s.IsTryAgain()) {
      // Write stalled, wait and retry
      LOG(WARNING) << "Write stalled, retrying in 100ms...";
      std::this_thread::sleep_for(std::chrono::milliseconds(100));
      retry_count++;
    } else {
      // Other error, don't retry
      return s;
    }
  }
  
  return Status::TimedOut("Write stalled after retries");
}

Proactive Monitoring

class StallMonitor {
 public:
  void CheckAndAlert(DB* db, std::shared_ptr<Statistics> stats) {
    // Check L0 file count
    uint64_t l0_files;
    db->GetIntProperty("rocksdb.num-files-at-level0", &l0_files);
    
    if (l0_files > level0_slowdown_threshold_ * 0.8) {
      LOG(WARNING) << "L0 files approaching stall threshold: " << l0_files;
      // Alert monitoring system
      SendAlert("L0 files high", l0_files);
    }
    
    // Check pending compaction
    uint64_t pending_bytes;
    db->GetIntProperty(
        "rocksdb.estimate-pending-compaction-bytes",
        &pending_bytes
    );
    
    if (pending_bytes > soft_pending_limit_ * 0.8) {
      LOG(WARNING) << "Pending compaction approaching limit: "
                   << pending_bytes / (1024 * 1024 * 1024) << " GB";
      SendAlert("Pending compaction high", pending_bytes);
    }
    
    // Check recent stalls
    uint64_t stall_micros = stats->getTickerCount(STALL_MICROS);
    if (stall_micros > last_stall_micros_) {
      uint64_t new_stalls = stall_micros - last_stall_micros_;
      LOG(ERROR) << "Write stalled for " << new_stalls / 1000 << " ms";
      SendAlert("Write stall detected", new_stalls);
    }
    last_stall_micros_ = stall_micros;
  }
  
 private:
  int level0_slowdown_threshold_ = 20;
  uint64_t soft_pending_limit_ = 64ULL * 1024 * 1024 * 1024;
  uint64_t last_stall_micros_ = 0;
};

Advanced Configuration

Rate Limiter

Control flush/compaction I/O to leave headroom for user requests:

#include "rocksdb/rate_limiter.h"

DBOptions db_options;
db_options.rate_limiter.reset(
    NewGenericRateLimiter(
        100 * 1024 * 1024,  // 100 MB/s for background I/O
        100 * 1000,         // Refill period: 100ms
        10                  // Fairness
    )
);

Disable Auto Compaction (Advanced)

ColumnFamilyOptions cf_options;
cf_options.disable_auto_compactions = true;

// Manually trigger compaction during low-traffic periods
db->CompactRange(CompactRangeOptions(), nullptr, nullptr);

Disabling auto compaction is dangerous. Only use if you have a sophisticated manual compaction strategy.

Troubleshooting Common Scenarios

Scenario 1: Sudden Write Spike

Symptoms: Writes stall during high-traffic periods Solutions:

// 1. Increase write buffers to absorb spikes
options.write_buffer_size = 256 * 1024 * 1024;  // Larger buffers
options.max_write_buffer_number = 6;            // More buffers

// 2. More aggressive compaction
db_options.max_background_jobs = 12;

// 3. Relax stall triggers
options.level0_slowdown_writes_trigger = 30;
options.level0_stop_writes_trigger = 50;

Scenario 2: Slow Compaction

Symptoms: Persistent stalls even with moderate write rate Solutions:

// 1. Faster compression
options.compression = kLZ4Compression;  // Was kZSTD

// 2. More compaction threads
db_options.max_background_compactions = 8;

// 3. Larger files (fewer compactions)
options.target_file_size_base = 256 * 1024 * 1024;

// 4. Check if I/O is bottleneck - add rate limiter headroom
db_options.rate_limiter->SetBytesPerSecond(200 * 1024 * 1024);

Scenario 3: Small Writes, Many L0 Files

Symptoms: Frequent small flushes creating many L0 files Solutions:

// 1. Larger write buffers
options.write_buffer_size = 128 * 1024 * 1024;

// 2. Merge multiple memtables before flush
options.min_write_buffer_number_to_merge = 3;

// 3. Level compaction dynamic level bytes
options.level_compaction_dynamic_level_bytes = true;

Best Practices

Prevention is better than cure: Configure RocksDB to handle your expected peak write rate with headroom.

Monitor proactively: Alert when approaching stall conditions (80% of threshold)
Size write buffers appropriately: Balance memory usage and flush frequency
Provision compaction resources: Ensure max_background_jobs can handle write rate
Use dynamic level bytes: Enables better adaptation to write patterns
Test under load: Validate configuration with realistic write workloads
Plan for bursts: Size buffers to absorb temporary write spikes

Recommended Starting Point

Options options;

// Write buffers
options.write_buffer_size = 128 * 1024 * 1024;  // 128 MB
options.max_write_buffer_number = 4;
options.min_write_buffer_number_to_merge = 2;

// L0 stall conditions
options.level0_slowdown_writes_trigger = 20;
options.level0_stop_writes_trigger = 36;

// Compaction limits
options.soft_pending_compaction_bytes_limit = 64ULL * 1024 * 1024 * 1024;
options.hard_pending_compaction_bytes_limit = 256ULL * 1024 * 1024 * 1024;

// Background jobs
DBOptions db_options;
db_options.max_background_jobs = 8;  // Adjust based on CPU cores

// Compression
options.compression_per_level = {
  kNoCompression,
  kLZ4Compression,
  kLZ4Compression,
  kZSTD,
};

// Dynamic level bytes
options.level_compaction_dynamic_level_bytes = true;

// Enable statistics
db_options.statistics = CreateDBStatistics();

Get Started

Core Concepts

Developer Guide

Advanced Topics

Language Bindings

Tools & Utilities

Write Stall Handling

Overview

Why Write Stalls Happen

Stall Conditions

Level 0 File Thresholds

Pending Compaction Bytes

Memtable Limits

Write Delay Rate

Monitoring Write Stalls

Statistics

Histogram

Database Properties

Preventing Write Stalls

1. Increase L0 Thresholds

2. Increase Write Buffers

3. Increase Compaction Resources

4. Adjust Compaction Limits

5. Optimize Compaction Speed

6. Enable Dynamic Level Bytes

Handling Stalls in Applications

Detect and Retry

Proactive Monitoring

Advanced Configuration

Rate Limiter

Disable Auto Compaction (Advanced)

Troubleshooting Common Scenarios

Scenario 1: Sudden Write Spike

Scenario 2: Slow Compaction

Scenario 3: Small Writes, Many L0 Files

Best Practices

Recommended Starting Point

See Also

Build docs developers (and LLMs) love

Get Started

Core Concepts

Developer Guide

Advanced Topics

Language Bindings

Tools & Utilities

​Overview

​Why Write Stalls Happen

​Stall Conditions

​Level 0 File Thresholds

​Pending Compaction Bytes

​Memtable Limits

​Write Delay Rate

​Monitoring Write Stalls

​Statistics

​Histogram

​Database Properties

​Preventing Write Stalls

​1. Increase L0 Thresholds

​2. Increase Write Buffers

​3. Increase Compaction Resources

​4. Adjust Compaction Limits

​5. Optimize Compaction Speed

​6. Enable Dynamic Level Bytes

​Handling Stalls in Applications

​Detect and Retry

​Proactive Monitoring

​Advanced Configuration

​Rate Limiter

​Disable Auto Compaction (Advanced)

​Troubleshooting Common Scenarios

​Scenario 1: Sudden Write Spike

​Scenario 2: Slow Compaction

​Scenario 3: Small Writes, Many L0 Files

​Best Practices

​Recommended Starting Point

​See Also

Build docs developers (and LLMs) love

Overview

Why Write Stalls Happen

Stall Conditions

Level 0 File Thresholds

Pending Compaction Bytes

Memtable Limits

Write Delay Rate

Monitoring Write Stalls

Statistics

Histogram

Database Properties

Preventing Write Stalls

1. Increase L0 Thresholds

2. Increase Write Buffers

3. Increase Compaction Resources

4. Adjust Compaction Limits

5. Optimize Compaction Speed

6. Enable Dynamic Level Bytes

Handling Stalls in Applications

Detect and Retry

Proactive Monitoring

Advanced Configuration

Rate Limiter

Disable Auto Compaction (Advanced)

Troubleshooting Common Scenarios

Scenario 1: Sudden Write Spike

Scenario 2: Slow Compaction

Scenario 3: Small Writes, Many L0 Files

Best Practices

Recommended Starting Point

See Also