Overview
Duckling is a high-performance DuckDB server that replicates MySQL data using a Sequential Appender architecture with ACID transactions. It provides a scalable analytical layer over operational MySQL data with 5-10x better query performance than traditional approaches.Architecture Diagram
Core Components
Server Layer
Express.js application providing RESTful API endpoints for:- Data synchronization (full, incremental, CDC)
- Query execution with WebSocket support
- Health monitoring and metrics
- Multi-database management
Database Connections
DuckDB Connection (src/database/duckdb.ts)
- Native columnar storage for analytical workloads
- Connection pooling (configurable via
DUCKDB_MAX_CONNECTIONS, default: 10) - Unified
@duckdb/node-apiarchitecture - High-performance Appender API for bulk loading
- ACID transaction support
MySQL Connection (src/database/mysql.ts)
- Source database streaming operations
- Connection pooling (configurable via
MYSQL_MAX_CONNECTIONS, default: 5) - Automatic schema detection
- Streaming batch processing (default: 1000 records/batch)
Sync Service Architecture
The Sequential Appender Service (src/services/sequentialAppenderService.ts) provides:
ACID Transactions
All-or-nothing writes with automatic rollback on failures
Watermark Tracking
Efficient incremental sync using timestamp/ID-based watermarks
Schema Evolution
Zero-downtime schema updates with automatic column detection
Streaming Batches
Memory-efficient processing of large datasets
Data Flow
- MySQL Source streams data in configurable batches (default: 1,000 records)
- Sequential Appender processes records with ACID guarantees:
- BEGIN TRANSACTION
- Validate and sanitize data
- INSERT/APPEND records sequentially
- COMMIT (atomic) or ROLLBACK (on error)
- DuckDB Storage persists data in compressed columnar format
- API Clients query via REST/WebSocket with 5-10x faster response times
Storage Structure
All DuckDB files use native columnar format with automatic compression. A single
.db file replaces the entire MySQL database with 5-10x better query performance.Performance Characteristics
Query Performance
Columnar Storage
DuckDB’s native columnar format optimized for analytical queries
Column Pruning
Read only the columns needed for each query
Compressed Storage
Built-in compression reduces disk I/O by 60-90%
In-Process
Zero network overhead for query execution
Sync Performance
| Operation | Speed | Method |
|---|---|---|
| Full Sync | 60,000+ rows/sec | Appender API (6x faster) |
| Incremental Sync | 10,000 rows/sec | INSERT OR REPLACE |
| CDC (Real-time) | Sub-second latency | Binlog streaming |
The Appender API provides 6x faster bulk loading (60K+ rows/sec vs 10K rows/sec with traditional INSERT). This enables syncing 60M records in minutes instead of hours.
Multi-Database Support
Duckling supports multiple isolated database replicas on a single server instance:- Isolated Replicas: Each MySQL source gets its own DuckDB file
- Separate Connection Pools: No cross-contamination between databases
- Unified Monitoring: All databases visible in one dashboard
- Database Selector: Frontend UI dropdown to switch contexts
Multi-Instance Pattern
?db={database_id} query parameter:
ACID Guarantees
Transaction Isolation
Every sync operation wraps database writes in explicit transactions:Benefits
Atomicity
All records in a batch succeed or fail together
Consistency
Database never in a partial state
Durability
CHECKPOINT ensures data persists to disk
No Duplicates
Primary key constraints enforced
Error Handling & Recovery
Automatic Recovery
- Health Monitoring: Checks connections every 60 seconds
- Auto-Reconnect: Exponential backoff retry (up to 3 attempts)
- Transaction Rollback: Automatic on any error
- Graceful Degradation: Continues syncing other tables on single-table failures
Sync Locks
Per-table sync locks prevent concurrent modifications:Monitoring & Observability
Sync Logs
Every sync operation is logged tosync_log table:
Real-Time Progress
Sync progress API provides live updates:Next Steps
Sequential Appender
Deep dive into the 60K+ rows/sec Appender API
Sync Modes
Full sync, incremental sync, and CDC modes
Multi-Database
Managing multiple isolated replicas