Read Replica Functionality
Duckling’s Read Replica service provides optional read-only replica support for DuckDB. It periodically snapshots the primary database and opens it in READ_ONLY mode, allowing API queries to be routed to the replica to avoid interference with write operations (sync, CDC).Overview
DuckDB uses MVCC (Multi-Version Concurrency Control) for read/write isolation within a single process. However, for maximum isolation (preventing read queries from blocking writes), the Read Replica service creates a physical copy of the database file and opens it in READ_ONLY mode. Architecture:Trade-Offs
| Aspect | Impact |
|---|---|
| Data Staleness | Queries see data stale by up to REPLICA_REFRESH_INTERVAL seconds |
| Disk Space | Requires ~2x disk space during copy (e.g., 400 GB for 200 GB database) |
| Copy Time | ~60-90 seconds for 200 GB database on SSD |
| Write Isolation | Zero interference — writes never block reads |
| Read Performance | Same as primary (columnar, compressed) |
Environment Variables
Enable read replica mode for isolating read queries from write operations
Seconds between snapshots (5 minutes default). Higher values = less disk I/O, more stale data.
Configuration Example
.env
How It Works
Snapshot Process
- Close Existing Replica: Close read-only connection to current snapshot
- Atomic Copy: Copy primary
.dbfile to.ro.db.tmp - Atomic Rename: Rename
.ro.db.tmp→.ro.db(atomic operation) - Open Read-Only: Open
.ro.dbinREAD_ONLYmode - Update Timestamp: Record
lastRefreshedAttimestamp
- Queries never see a partial/corrupted file (rename is atomic)
- If copy fails, old replica remains available
File Structure
Query Routing
WhenREAD_REPLICA_ENABLED=true:
- If replica is not ready (first snapshot in progress), fall back to primary
- If replica query fails, fall back to primary
- If
READ_REPLICA_ENABLED=false, always use primary
Refresh Scheduling
The replica refreshes on a fixed interval starting from service start:Monitoring
Status API
Get replica status:Key Metrics
Timestamp of last successful snapshot. Data is stale by
now - lastRefreshedAt.True if snapshot is currently in progress (queries use old replica or fall back to primary).
Number of failed snapshot attempts. Non-zero indicates issues (disk space, file locks).
Availability Check
Data Staleness
Measuring Staleness
Data staleness depends on:- Refresh Interval:
REPLICA_REFRESH_INTERVALseconds - Copy Duration: Time to copy primary → replica (depends on file size + disk I/O)
- Write Rate: How fast primary is changing during copy
Use Cases by Staleness Tolerance
| Staleness Tolerance | Use Case | Recommended Interval |
|---|---|---|
| < 1 minute | Real-time dashboards | Use CDC instead |
| 1-5 minutes | Analytics dashboards | 60-300 seconds |
| 5-15 minutes | Reporting queries | 300-900 seconds |
| > 15 minutes | Batch ETL | 900+ seconds |
Real-Time Queries: For sub-minute staleness, use CDC (binlog streaming) instead of read replicas.
Performance Impact
Refresh Overhead
| Database Size | Copy Time (SSD) | CPU Usage | I/O Impact |
|---|---|---|---|
| 10 GB | ~5 seconds | Low | Minimal |
| 100 GB | ~30 seconds | Medium | Moderate |
| 200 GB | ~90 seconds | High | Significant |
| 500 GB | ~4 minutes | Very High | Heavy |
Query Performance
Read replica queries have identical performance to primary:- Same columnar storage format
- Same compression algorithms
- Same query optimizer
Tuning Recommendations
Small Databases (< 50 GB)
Medium Databases (50-200 GB)
Large Databases (200-500 GB)
Very Large Databases (> 500 GB)
Error Handling
Refresh Failures
Common causes of snapshot failures:- Disk Space Exhausted: Need 2x database size free
- File Locked: Another process has primary file locked
- I/O Errors: Disk failure or filesystem issues
- Error logged to console
totalErrorscounter incremented- Old replica remains available (no downtime)
- Next refresh attempt occurs at next interval
Query Failures
If replica query fails:- Close Replica Connection: Prevent repeated failures
- Log Error: Record failure reason
- Return Null: Signal fallback to primary
- Next Refresh: Reconnect attempt at next snapshot
DuckDB Constraints
Single-Process Limitation
DuckDB’s read/write isolation is process-local:- ✅ Within Process: Read + write use MVCC (no conflict)
- ❌ Across Processes: All must open in READ_ONLY mode
READ_ONLY Mode
- No writes allowed (INSERT, UPDATE, DELETE, CREATE TABLE, etc.)
- No temporary tables
- No schema changes
- Queries-only
Troubleshooting
Replica Not Available
READ_REPLICA_ENABLED=false— enable in.env- First snapshot in progress — wait for
lastRefreshedAtto populate - Snapshot failing — check
totalErrorsand logs
High Staleness
- Decrease
REPLICA_REFRESH_INTERVAL - Use faster disk (NVMe SSD)
- Use CDC for real-time data
Disk Space Errors
- Free up disk space (delete old backups, logs)
- Increase disk size (need 2x database size free)
- Disable replicas:
READ_REPLICA_ENABLED=false
Code Reference
Implementation:packages/server/src/services/readReplicaService.ts
Key Classes:
ReadReplicaService- Main service class (multi-instance per database)ReplicaStatus- Status interface for monitoring
start()- Initialize and schedule refreshes (line 84)stop()- Stop service and cleanup files (line 107)executeReadOnly()- Execute query on replica (line 137)refreshReplica()- Create snapshot (line 231)isAvailable()- Check if replica is ready (line 221)