CDC with MySQL Binlog Streaming
Duckling’s CDC (Change Data Capture) service provides real-time MySQL to DuckDB replication using binlog streaming. This enables near-zero lag replication for mission-critical applications that require live data.Overview
The CDC service uses the@vlasky/zongji library to parse MySQL binary logs and capture INSERT, UPDATE, and DELETE operations in real-time. Changes are applied to DuckDB immediately with automatic recovery and position tracking.
Key Features:
- Real-time replication (INSERT, UPDATE, DELETE)
- Binlog position tracking for resume capability
- Auto-reconnect on disconnect (up to 10 attempts)
- Multi-database support
- Graceful error handling with exponential backoff
- Backpressure control to prevent memory overflow
Architecture
Environment Variables
Enable CDC binlog streaming for real-time replication
Maximum event queue size before applying backpressure (pauses binlog stream)
Reject unauthorized SSL certificates. Set to
false for self-signed certs (DigitalOcean Managed MySQL)Configuration
CDC is configured per database using the MySQL connection string:SSL Support: CDC automatically enables SSL for DigitalOcean Managed MySQL and other providers. Use
CDC_SSL_REJECT_UNAUTHORIZED=false for self-signed certificates.Binlog Position Tracking
The CDC service tracks binlog positions in DuckDB to enable resume capability after restarts or failures:- Every 100 events processed
- After each row event (INSERT, UPDATE, DELETE)
- Only after successful DuckDB transaction
Event Processing
Supported Event Types
CDC captures the following MySQL binlog events:| Event Type | Description | DuckDB Operation |
|---|---|---|
WriteRows | INSERT statements | INSERT OR REPLACE (upsert) |
UpdateRows | UPDATE statements | INSERT OR REPLACE (upsert) |
DeleteRows | DELETE statements | DELETE WHERE pk = ? |
Rotate | Binlog file rotation | Update position tracking |
TableMap | Table schema metadata | Cache column mappings |
Event Queue & Backpressure
The CDC service uses a serialized event queue to maintain correct ordering:- Queue Events: Binlog events are pushed to an in-memory queue
- Serial Processing: Events processed one-at-a-time to preserve ordering
- Backpressure: When queue reaches
CDC_MAX_QUEUE_SIZE, the binlog stream is paused - Resume: Stream resumes when queue drains below 50% capacity
Table Filtering
Control which tables are replicated:Starting CDC
CDC is started automatically whencdc.enabled: true in the database configuration:
- Initialize binlog position table
- Check for existing position (resume point)
- Connect to MySQL binlog stream
- If position exists: resume from saved position
- If no position: start from current binlog end (
startAtEnd: true)
Monitoring CDC
Statistics API
Get real-time CDC statistics:Key Metrics
Current number of events waiting in queue. Should be < 1000 under normal load.
Peak queue size since service started. Helps identify load spikes.
Total errors encountered. Non-zero indicates issues requiring investigation.
Error Handling & Recovery
Auto-Reconnect
CDC automatically reconnects on failures:- Max Attempts: 10 reconnection attempts
- Backoff Strategy: Exponential (5s, 10s, 20s, 40s, …)
- Resume Point: Always resumes from last saved position
Graceful Shutdown
On service stop, CDC:- Stops accepting new binlog events
- Drains event queue (waits for pending events)
- Saves final binlog position
- Closes MySQL connection
MySQL Configuration
Required MySQL Settings
CDC requires MySQL binlog format to beROW:
User Permissions
The MySQL user requires these permissions:Performance Characteristics
| Metric | Value | Notes |
|---|---|---|
| Replication Lag | < 100ms | Under normal load |
| Throughput | ~60,000 ops/sec | With queue backpressure |
| Memory Usage | ~50-200 MB | Depends on queue size |
| Disk I/O | Minimal | Only position checkpoints |
CDC vs Incremental Sync
Choosing between CDC and incremental sync:| Feature | CDC (Binlog) | Incremental Sync |
|---|---|---|
| Lag | < 100ms | 15 minutes (default) |
| Overhead | Continuous connection | Periodic queries |
| Data Loss Risk | None (position tracked) | None (watermark tracked) |
| Best For | Real-time dashboards | Analytical workloads |
| DELETE Support | Yes (immediate) | No (append-only) |
Recommendation: Use CDC for real-time applications (< 1 minute lag), use incremental sync for analytical workloads (15+ minute intervals).
Troubleshooting
CDC Not Starting
High Queue Size
IfqueueSize consistently exceeds 1000:
- Increase Queue Limit: Set
CDC_MAX_QUEUE_SIZE=20000 - Check DuckDB Performance: Ensure disk I/O is not bottlenecked
- Filter Tables: Use
excludeTablesto reduce event volume - Scale Vertically: Add more CPU/RAM to handle throughput
Position Not Saving
Check CDC logs for transaction errors:- DuckDB file locked by another process
- Disk space exhausted
- Schema mismatch (table dropped but CDC still tracking)
Code Reference
CDC implementation:packages/server/src/services/cdcService.ts
Key Classes:
CDCService- Main service class (multi-instance per database)BinlogPosition- Position tracking interfaceCDCStats- Statistics and monitoring
start()- Initialize and start binlog streaming (line 619)handleInsert()- Process INSERT events (line 374)handleUpdate()- Process UPDATE events (line 423)handleDelete()- Process DELETE events (line 473)savePosition()- Checkpoint binlog position (line 202)