Skip to main content

Overview

Duckling is a high-performance DuckDB server that replicates MySQL data using a Sequential Appender architecture with ACID transactions. It provides a scalable analytical layer over operational MySQL data with 5-10x better query performance than traditional approaches.

Architecture Diagram

MySQL (Source) → Sequential Appender → DuckDB Native Storage → API Clients
                        ↓                      ↓
                  BEGIN TRANSACTION      Columnar Format
                  INSERT sequentially     (Compressed)
                  COMMIT / ROLLBACK      Watermark Tracking

                 Atomic & ACID
                 No Duplicates
                 Data Integrity

Core Components

Server Layer

Express.js application providing RESTful API endpoints for:
  • Data synchronization (full, incremental, CDC)
  • Query execution with WebSocket support
  • Health monitoring and metrics
  • Multi-database management

Database Connections

DuckDB Connection (src/database/duckdb.ts)

  • Native columnar storage for analytical workloads
  • Connection pooling (configurable via DUCKDB_MAX_CONNECTIONS, default: 10)
  • Unified @duckdb/node-api architecture
  • High-performance Appender API for bulk loading
  • ACID transaction support

MySQL Connection (src/database/mysql.ts)

  • Source database streaming operations
  • Connection pooling (configurable via MYSQL_MAX_CONNECTIONS, default: 5)
  • Automatic schema detection
  • Streaming batch processing (default: 1000 records/batch)

Sync Service Architecture

The Sequential Appender Service (src/services/sequentialAppenderService.ts) provides:

ACID Transactions

All-or-nothing writes with automatic rollback on failures

Watermark Tracking

Efficient incremental sync using timestamp/ID-based watermarks

Schema Evolution

Zero-downtime schema updates with automatic column detection

Streaming Batches

Memory-efficient processing of large datasets

Data Flow

  1. MySQL Source streams data in configurable batches (default: 1,000 records)
  2. Sequential Appender processes records with ACID guarantees:
    • BEGIN TRANSACTION
    • Validate and sanitize data
    • INSERT/APPEND records sequentially
    • COMMIT (atomic) or ROLLBACK (on error)
  3. DuckDB Storage persists data in compressed columnar format
  4. API Clients query via REST/WebSocket with 5-10x faster response times

Storage Structure

data/
├── databases.json            # Multi-database configuration
├── {database_id}.db          # DuckDB file per database (columnar)
├── lms.db                    # Example: LMS database replica
└── chitti_common.db          # Example: Common database replica
All DuckDB files use native columnar format with automatic compression. A single .db file replaces the entire MySQL database with 5-10x better query performance.

Performance Characteristics

Query Performance

Columnar Storage

DuckDB’s native columnar format optimized for analytical queries

Column Pruning

Read only the columns needed for each query

Compressed Storage

Built-in compression reduces disk I/O by 60-90%

In-Process

Zero network overhead for query execution

Sync Performance

OperationSpeedMethod
Full Sync60,000+ rows/secAppender API (6x faster)
Incremental Sync10,000 rows/secINSERT OR REPLACE
CDC (Real-time)Sub-second latencyBinlog streaming
The Appender API provides 6x faster bulk loading (60K+ rows/sec vs 10K rows/sec with traditional INSERT). This enables syncing 60M records in minutes instead of hours.

Multi-Database Support

Duckling supports multiple isolated database replicas on a single server instance:
  • Isolated Replicas: Each MySQL source gets its own DuckDB file
  • Separate Connection Pools: No cross-contamination between databases
  • Unified Monitoring: All databases visible in one dashboard
  • Database Selector: Frontend UI dropdown to switch contexts

Multi-Instance Pattern

// One instance per database
DuckDBConnection: Map<string, DuckDBConnection>
MySQLConnection: Map<string, MySQLConnection>
SequentialAppenderService: Map<string, SequentialAppenderService>
All endpoints accept ?db={database_id} query parameter:
curl 'http://localhost:3001/api/query?db=lms' \
  -H 'Content-Type: application/json' \
  -d '{"sql": "SELECT COUNT(*) FROM User"}'

ACID Guarantees

Transaction Isolation

Every sync operation wraps database writes in explicit transactions:
await this.duckdb.run('BEGIN TRANSACTION');
try {
  // Insert/update records
  await this.duckdb.run('COMMIT');
} catch (error) {
  await this.duckdb.run('ROLLBACK');
  throw error;
}

Benefits

Atomicity

All records in a batch succeed or fail together

Consistency

Database never in a partial state

Durability

CHECKPOINT ensures data persists to disk

No Duplicates

Primary key constraints enforced

Error Handling & Recovery

Automatic Recovery

  • Health Monitoring: Checks connections every 60 seconds
  • Auto-Reconnect: Exponential backoff retry (up to 3 attempts)
  • Transaction Rollback: Automatic on any error
  • Graceful Degradation: Continues syncing other tables on single-table failures

Sync Locks

Per-table sync locks prevent concurrent modifications:
// Acquire lock before sync
if (!this.tryAcquireTableLock(tableName)) {
  throw new SyncAlreadyInProgressError();
}
try {
  await this.syncTable(tableName);
} finally {
  this.releaseTableLock(tableName);
}
Never run multiple full syncs concurrently. The system automatically queues sync requests and processes them sequentially to ensure data integrity.

Monitoring & Observability

Sync Logs

Every sync operation is logged to sync_log table:
CREATE TABLE sync_log (
  id INTEGER PRIMARY KEY,
  table_name VARCHAR,
  sync_type VARCHAR,           -- 'watermark', 'sequential', 'full'
  records_processed INTEGER,
  duration_ms INTEGER,
  status VARCHAR,              -- 'success', 'error'
  error_message VARCHAR,
  watermark_before VARCHAR,    -- JSON snapshot
  watermark_after VARCHAR,     -- JSON snapshot
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Real-Time Progress

Sync progress API provides live updates:
interface SyncProgressStatus {
  inProgress: boolean;
  type: 'full' | 'incremental' | null;
  tablesCompleted: number;
  tablesTotal: number;
  currentTable: string | null;
  recordsProcessed: number;
  startedAt: string | null;
  lastError: string | null;
}

Next Steps

Sequential Appender

Deep dive into the 60K+ rows/sec Appender API

Sync Modes

Full sync, incremental sync, and CDC modes

Multi-Database

Managing multiple isolated replicas

Build docs developers (and LLMs) love