Performance Tuning - YugabyteDB

This guide covers performance tuning strategies for YugabyteDB clusters, including configuration flags, resource management, and optimization techniques.

Performance Fundamentals

Key Performance Metrics

Latency Metrics:

Read latency (p50, p99, p999)
Write latency (p50, p99, p999)
Transaction latency
Replication lag

Throughput Metrics:

Operations per second (reads/writes)
Queries per second
Network throughput
Disk I/O throughput

Resource Utilization:

CPU usage per core
Memory utilization
Disk space and IOPS
Network bandwidth

Configuration Flags

YB-TServer Performance Flags

Core Server Configuration:

# RPC and queue settings
--tablet_server_svc_queue_length=5000
--max_clock_skew_usec=500000  # 500ms default
--rpc_workers_limit=64

# Memory management
--db_block_cache_size_bytes=8589934592  # 8GB
--memstore_size_mb=1024  # 1GB per tablet
--global_memstore_size_mb=2048

# Data directories
--fs_data_dirs=/mnt/disk1,/mnt/disk2,/mnt/disk3

Connection and Threading:

# PostgreSQL connections
--ysql_max_connections=300
--ysql_conn_mgr_max_client_connections=10000

# Service threads
--tablet_server_svc_num_threads=64
--ts_consensus_svc_num_threads=64
--ts_remote_bootstrap_svc_num_threads=10

Compaction Settings:

# Compaction threads and priorities
--rocksdb_max_background_compactions=16
--rocksdb_max_background_flushes=4
--rocksdb_base_background_compactions=4

# Compaction behavior
--rocksdb_level0_file_num_compaction_trigger=4
--rocksdb_level0_slowdown_writes_trigger=20
--rocksdb_level0_stop_writes_trigger=32

YB-Master Performance Flags

# Master RPC configuration
--master_svc_queue_length=1000
--master_svc_num_threads=32

# Tablet splitting
--tablet_split_size_threshold_bytes=10737418240  # 10GB
--enable_automatic_tablet_splitting=true

# Load balancing
--load_balancer_max_concurrent_moves=10
--load_balancer_max_concurrent_adds=1
--load_balancer_max_concurrent_removals=1

Memory Management

Block Cache Sizing

The block cache stores recently accessed data blocks in memory. Calculation:

Recommended = (Total RAM - OS overhead - Other services) × 0.5 to 0.7

Example for 64GB node:
Block Cache = (64GB - 4GB - 8GB) × 0.6 = ~31GB

Configuration:

--db_block_cache_size_bytes=33285996544  # 31GB

Monitoring:

Cache hit rate (target: >95%)
Cache eviction rate
Memory pressure indicators

Memstore Configuration

Memstores buffer writes before flushing to disk. Per-Tablet Memstore:

--memstore_size_mb=128  # Per tablet

Global Memstore Limit:

--global_memstore_size_mb=4096  # Total across all tablets

Best Practices:

Higher memstore = fewer flushes, more memory usage
Monitor flush frequency and write amplification
Adjust based on write patterns

Memory Pressure Handling

# Reject operations under memory pressure
--memory_limit_soft_percentage=85
--memory_limit_hard_percentage=95

# Background task memory limits
--server_tcmalloc_max_total_thread_cache_bytes=268435456  # 256MB

Disk and I/O Optimization

Storage Configuration

Multiple Data Directories:

--fs_data_dirs=/mnt/nvme1,/mnt/nvme2,/mnt/nvme3

Benefits:

Parallel I/O across devices
Better load distribution
Increased throughput

Write-Ahead Log (WAL):

# WAL settings
--fs_wal_dirs=/mnt/ssd1  # Separate fast disk for WAL
--durable_wal_write=true
--wal_segment_size_mb=64

Compaction Tuning

Compactions reorganize data on disk for efficient reads. Strategies:

# Universal compaction (default for YCQL)
--rocksdb_universal_compaction_size_ratio=20
--rocksdb_universal_compaction_min_merge_width=4

# Level compaction (default for YSQL)
--rocksdb_level0_file_num_compaction_trigger=4
--rocksdb_max_file_size_for_compaction=209715200  # 200MB

Resource Allocation:

# Limit compaction impact
--rocksdb_compact_flush_rate_limit_bytes_per_sec=268435456  # 256MB/s
--priority_thread_pool_size=3

I/O Scheduling

Disk Performance Requirements:

Minimum IOPS: 1000 IOPS per TB
Recommended: NVMe SSD or high-performance SSD
Latency target: < 10ms for writes, < 5ms for reads

Monitoring Disk Health:

# Check disk I/O utilization
iostat -x 5

# Monitor queue depth
sar -d 5

Network Optimization

Network Configuration

# RPC settings
--rpc_bind_addresses=<private-ip>
--server_broadcast_addresses=<public-ip>
--use_private_ip=cloud  # Use private IPs within cloud

# Connection pooling
--num_connections_to_server=8
--rpc_connection_timeout_ms=15000

Compression

# Enable RPC compression for cross-AZ traffic
--enable_stream_compression=true
--stream_compression_algo=2  # LZ4

When to Use:

Cross-region replication
High-latency networks
Bandwidth-constrained environments

Trade-offs:

Reduces network bandwidth (30-50%)
Increases CPU usage (5-10%)
Best for large data transfers

Query Performance

Connection Management

Connection Pooling:

# Enable connection manager
--enable_ysql_conn_mgr=true
--ysql_conn_mgr_max_client_connections=10000
--ysql_conn_mgr_num_workers=4

Benefits:

Reduced connection overhead
Better resource utilization
Support for more concurrent clients

Query Optimization

Prepared Statements:

Use prepared statements for repeated queries
Reduces parsing overhead
Enables better query plan caching

Batch Operations:

-- Instead of multiple INSERTs
INSERT INTO table VALUES (1, 'a'), (2, 'b'), (3, 'c');

-- Use COPY for bulk loads
COPY table FROM '/data/file.csv' WITH (FORMAT csv);

Index Optimization:

Create indexes on frequently filtered columns
Use covering indexes to avoid table lookups
Monitor index usage and remove unused indexes
Consider partial indexes for filtered queries

Read Replicas

Offload read traffic to dedicated read replicas:

yb-admin --master_addresses ip1:7100,ip2:7100,ip3:7100 \
  add_read_replica_placement_info \
  cloud.region.zone min_num_replicas

Benefits:

Isolate analytics workloads
Reduce load on primary cluster
Serve reads closer to users geographically

Tablet Management

Tablet Splitting

Automatic tablet splitting improves performance as data grows. Configuration:

# Enable automatic splitting
--enable_automatic_tablet_splitting=true
--tablet_split_size_threshold_bytes=10737418240  # 10GB

# Split limits
--outstanding_tablet_split_limit=1
--outstanding_tablet_split_limit_per_tserver=1

When Splitting Helps:

Tables outgrow initial tablet count
Range-scanned tables with unpredictable distribution
Hot spots on specific key ranges
Node count exceeds tablet count

Monitoring:

Track tablet size distribution
Monitor split operations in master logs
Watch for overloaded tablets

Tablet Co-location

Co-locate small tables to reduce tablet overhead:

-- Create co-located database
CREATE DATABASE mydb WITH colocated = true;

-- Tables automatically co-located
CREATE TABLE small_table (id INT PRIMARY KEY, data TEXT);

Benefits:

Reduced memory overhead
Fewer Raft groups
Better resource utilization
Ideal for small reference tables

Replication and Consistency

Replication Factor

# Set replication factor at cluster creation
yb-admin --master_addresses ip1:7100,ip2:7100,ip3:7100 \
  modify_placement_info cloud.region.zone 3

Trade-offs:

RF=3: Standard, good balance
RF=5: Higher durability, more write latency
RF=1: Testing only, no fault tolerance

Read Consistency

Follower Reads:

--yb_enable_read_committed_isolation=true
--max_stale_read_bound_time_ms=10000  # 10 seconds staleness

When to Use:

Analytics queries tolerating staleness
Read-heavy workloads
Geo-distributed reads from local replicas

Clock Skew Management

Time Synchronization

NTP Configuration:

# Check NTP status
timedatectl status
ntpq -p

# Monitor clock skew
watch -n 1 'ntpq -p | grep "*"'

Clock Skew Impact:

--max_clock_skew_usec=500000  # Default: 500ms

Lower skew benefits:

Reduced transaction latency
Tighter timestamp bounds
Better snapshot isolation

Using Clockbound (Advanced):

--time_source=clockbound
--max_clock_skew_usec=50000  # 50ms with synchronized clocks

Requires:

Hardware time sync (PTP, GPS, atomic clock)
Clockbound daemon on all nodes
Sub-millisecond clock accuracy

Load Balancing

Cluster Balancing

# Balance tablet leaders across nodes
yb-admin --master_addresses ip1:7100,ip2:7100,ip3:7100 \
  set_load_balancer_enabled true

# Check load balancer status
yb-admin --master_addresses ip1:7100,ip2:7100,ip3:7100 \
  get_load_balancer_state

Balancing Parameters:

--load_balancer_max_concurrent_moves=10
--load_balancer_max_concurrent_tablet_remote_bootstraps=10
--load_balancer_max_concurrent_tablet_remote_bootstraps_per_table=2

Preferred Leaders

Pin leaders to specific zones for read performance:

yb-admin --master_addresses ip1:7100,ip2:7100,ip3:7100 \
  set_preferred_zones cloud.region.zone1:1 cloud.region.zone2:2

Benefits:

Reduced read latency in primary zone
Predictable failover behavior
Optimized for asymmetric read/write patterns

Troubleshooting Performance

Slow Query Analysis

Enable slow query logging:

--ysql_log_min_duration_statement=1000  # Log queries > 1 second

Analyze slow queries:

-- PostgreSQL pg_stat_statements
SELECT query, calls, mean_exec_time, max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

RPC Slow Logs

Slow RPCs (>75% of timeout) are logged with detailed traces:

W0325 06:47:13.032176 inbound_call.cc:204] 
  Call ConsensusService.UpdateConsensus took 2644ms (timeout 1000ms)
Trace:
  0325 06:47:10.388015 (+0us) Inserting onto call queue
  0325 06:47:10.394859 (+6844us) Handling call
  0325 06:47:13.032064 (+2581367us) Filling consensus response

Common causes:

Lock contention
Disk I/O saturation
Network delays
Large batch operations

Memory Analysis

Check memory usage:

# Via metrics endpoint
curl http://tserver-ip:9000/mem-trackers

# Via command line
free -h
cat /proc/meminfo

Memory pressure indicators:

High swap usage
Frequent compaction stalls
OOM killer activations
Slow allocation times

Compaction Issues

Signs of compaction problems:

Increasing read latency
Growing disk usage
Level-0 file accumulation
Write stalls

Diagnosis:

# Check compaction stats at metrics endpoint
curl http://tserver-ip:9000/metrics

# Look for:
# - rocksdb_compaction_pending
# - rocksdb_num_running_compactions
# - rocksdb_estimate_pending_compaction_bytes

Resolution:

Increase compaction threads
Reduce write rate temporarily
Add more disk throughput
Adjust compaction triggers

Performance Benchmarking

Baseline Metrics

Establish baseline performance for your workload:

# YSQL workload
yb-sample-apps --workload SqlInserts \
  --nodes ip1:5433,ip2:5433,ip3:5433 \
  --num_threads_write 32 \
  --num_threads_read 64

# YCQL workload
yb-sample-apps --workload CassandraBatchKeyValue \
  --nodes ip1:9042,ip2:9042,ip3:9042 \
  --num_threads_write 32

Load Testing

Incremental load testing:

Start with 25% of target load
Monitor metrics for 1 hour
Increase by 25% increments
Identify breaking point
Configure for 70-80% of max capacity

Metrics to track:

Latency percentiles (p50, p99, p999)
CPU utilization per node
Disk IOPS and throughput
Network bandwidth
Memory usage and pressure

Best Practices Summary

Resource Allocation

CPU: 16+ cores per node, less than 70% average utilization
Memory: 32-64GB+, 50% for block cache
Disk: NVMe SSD, 1000+ IOPS/TB, less than 70% utilization
Network: 10Gbps+, less than 50% utilization

Configuration Priorities

Size block cache appropriately (50-70% of RAM)
Configure multiple data directories for parallel I/O
Enable automatic tablet splitting for growing tables
Set appropriate compaction threads based on cores
Use connection pooling for high-concurrency workloads

Monitoring and Maintenance

Monitor key metrics continuously (latency, throughput, resources)
Review slow queries regularly and optimize
Check compaction health and adjust if needed
Balance load across nodes periodically
Test configuration changes in staging first

Next Steps

Monitoring - Set up comprehensive performance monitoring
Troubleshooting - Diagnose performance issues
Admin Guide - Administrative operations
Backup and Restore - Backup performance considerations

Get Started

Core Concepts

Deployment

Develop

Operations

Security

Advanced Features

​Performance Fundamentals

​Key Performance Metrics

​Configuration Flags

​YB-TServer Performance Flags

​YB-Master Performance Flags

​Memory Management

​Block Cache Sizing

​Memstore Configuration

​Memory Pressure Handling

​Disk and I/O Optimization

​Storage Configuration

​Compaction Tuning

​I/O Scheduling

​Network Optimization

​Network Configuration

​Compression

​Query Performance

​Connection Management

​Query Optimization

​Read Replicas

​Tablet Management

​Tablet Splitting

​Tablet Co-location

​Replication and Consistency

​Replication Factor

​Read Consistency

​Clock Skew Management

​Time Synchronization

​Load Balancing

​Cluster Balancing

​Preferred Leaders

​Troubleshooting Performance

​Slow Query Analysis

​RPC Slow Logs

​Memory Analysis

​Compaction Issues

​Performance Benchmarking

​Baseline Metrics

​Load Testing

​Best Practices Summary

​Resource Allocation

​Configuration Priorities

​Monitoring and Maintenance

​Next Steps

Build docs developers (and LLMs) love

Performance Fundamentals

Key Performance Metrics

Configuration Flags

YB-TServer Performance Flags

YB-Master Performance Flags

Memory Management

Block Cache Sizing

Memstore Configuration

Memory Pressure Handling

Disk and I/O Optimization

Storage Configuration

Compaction Tuning

I/O Scheduling

Network Optimization

Network Configuration

Compression

Query Performance

Connection Management

Query Optimization

Read Replicas

Tablet Management

Tablet Splitting

Tablet Co-location

Replication and Consistency

Replication Factor

Read Consistency

Clock Skew Management

Time Synchronization

Load Balancing

Cluster Balancing

Preferred Leaders

Troubleshooting Performance

Slow Query Analysis

RPC Slow Logs

Memory Analysis

Compaction Issues

Performance Benchmarking

Baseline Metrics

Load Testing

Best Practices Summary

Resource Allocation

Configuration Priorities

Monitoring and Maintenance

Next Steps