System Architecture

Overview

Duckling is a high-performance DuckDB server that replicates MySQL data using a Sequential Appender architecture with ACID transactions. It provides a scalable analytical layer over operational MySQL data with 5-10x better query performance than traditional approaches.

Architecture Diagram

MySQL (Source) → Sequential Appender → DuckDB Native Storage → API Clients
                        ↓                      ↓
                  BEGIN TRANSACTION      Columnar Format
                  INSERT sequentially     (Compressed)
                  COMMIT / ROLLBACK      Watermark Tracking
                        ↓
                 Atomic & ACID
                 No Duplicates
                 Data Integrity

Core Components

Server Layer

Express.js application providing RESTful API endpoints for:

Data synchronization (full, incremental, CDC)
Query execution with WebSocket support
Health monitoring and metrics
Multi-database management

Database Connections

DuckDB Connection (`src/database/duckdb.ts`)

Native columnar storage for analytical workloads
Connection pooling (configurable via DUCKDB_MAX_CONNECTIONS, default: 10)
Unified @duckdb/node-api architecture
High-performance Appender API for bulk loading
ACID transaction support

MySQL Connection (`src/database/mysql.ts`)

Source database streaming operations
Connection pooling (configurable via MYSQL_MAX_CONNECTIONS, default: 5)
Automatic schema detection
Streaming batch processing (default: 1000 records/batch)

Sync Service Architecture

The Sequential Appender Service (src/services/sequentialAppenderService.ts) provides:

ACID Transactions

All-or-nothing writes with automatic rollback on failures

Watermark Tracking

Efficient incremental sync using timestamp/ID-based watermarks

Schema Evolution

Zero-downtime schema updates with automatic column detection

Streaming Batches

Memory-efficient processing of large datasets

Data Flow

MySQL Source streams data in configurable batches (default: 1,000 records)
Sequential Appender processes records with ACID guarantees:
- BEGIN TRANSACTION
- Validate and sanitize data
- INSERT/APPEND records sequentially
- COMMIT (atomic) or ROLLBACK (on error)
DuckDB Storage persists data in compressed columnar format
API Clients query via REST/WebSocket with 5-10x faster response times

Storage Structure

data/
├── databases.json            # Multi-database configuration
├── {database_id}.db          # DuckDB file per database (columnar)
├── lms.db                    # Example: LMS database replica
└── chitti_common.db          # Example: Common database replica

All DuckDB files use native columnar format with automatic compression. A single .db file replaces the entire MySQL database with 5-10x better query performance.

Performance Characteristics

Query Performance

Columnar Storage

DuckDB’s native columnar format optimized for analytical queries

Column Pruning

Read only the columns needed for each query

Compressed Storage

Built-in compression reduces disk I/O by 60-90%

In-Process

Zero network overhead for query execution

Sync Performance

Operation	Speed	Method
Full Sync	60,000+ rows/sec	Appender API (6x faster)
Incremental Sync	10,000 rows/sec	INSERT OR REPLACE
CDC (Real-time)	Sub-second latency	Binlog streaming

The Appender API provides 6x faster bulk loading (60K+ rows/sec vs 10K rows/sec with traditional INSERT). This enables syncing 60M records in minutes instead of hours.

Multi-Database Support

Duckling supports multiple isolated database replicas on a single server instance:

Isolated Replicas: Each MySQL source gets its own DuckDB file
Separate Connection Pools: No cross-contamination between databases
Unified Monitoring: All databases visible in one dashboard
Database Selector: Frontend UI dropdown to switch contexts

Multi-Instance Pattern

// One instance per database
DuckDBConnection: Map<string, DuckDBConnection>
MySQLConnection: Map<string, MySQLConnection>
SequentialAppenderService: Map<string, SequentialAppenderService>

All endpoints accept ?db={database_id} query parameter:

curl 'http://localhost:3001/api/query?db=lms' \
  -H 'Content-Type: application/json' \
  -d '{"sql": "SELECT COUNT(*) FROM User"}'

ACID Guarantees

Transaction Isolation

Every sync operation wraps database writes in explicit transactions:

await this.duckdb.run('BEGIN TRANSACTION');
try {
  // Insert/update records
  await this.duckdb.run('COMMIT');
} catch (error) {
  await this.duckdb.run('ROLLBACK');
  throw error;
}

Benefits

Atomicity

All records in a batch succeed or fail together

Consistency

Database never in a partial state

Durability

CHECKPOINT ensures data persists to disk

No Duplicates

Primary key constraints enforced

Error Handling & Recovery

Automatic Recovery

Health Monitoring: Checks connections every 60 seconds
Auto-Reconnect: Exponential backoff retry (up to 3 attempts)
Transaction Rollback: Automatic on any error
Graceful Degradation: Continues syncing other tables on single-table failures

Sync Locks

Per-table sync locks prevent concurrent modifications:

// Acquire lock before sync
if (!this.tryAcquireTableLock(tableName)) {
  throw new SyncAlreadyInProgressError();
}
try {
  await this.syncTable(tableName);
} finally {
  this.releaseTableLock(tableName);
}

Never run multiple full syncs concurrently. The system automatically queues sync requests and processes them sequentially to ensure data integrity.

Monitoring & Observability

Sync Logs

Every sync operation is logged to sync_log table:

CREATE TABLE sync_log (
  id INTEGER PRIMARY KEY,
  table_name VARCHAR,
  sync_type VARCHAR,           -- 'watermark', 'sequential', 'full'
  records_processed INTEGER,
  duration_ms INTEGER,
  status VARCHAR,              -- 'success', 'error'
  error_message VARCHAR,
  watermark_before VARCHAR,    -- JSON snapshot
  watermark_after VARCHAR,     -- JSON snapshot
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Real-Time Progress

Sync progress API provides live updates:

interface SyncProgressStatus {
  inProgress: boolean;
  type: 'full' | 'incremental' | null;
  tablesCompleted: number;
  tablesTotal: number;
  currentTable: string | null;
  recordsProcessed: number;
  startedAt: string | null;
  lastError: string | null;
}

Next Steps

Sequential Appender

Deep dive into the 60K+ rows/sec Appender API

Sync Modes

Full sync, incremental sync, and CDC modes

Multi-Database

Managing multiple isolated replicas

Get Started

Core Concepts

Setup & Deployment

Operations

Advanced Features

​Overview

​Architecture Diagram

​Core Components

​Server Layer

​Database Connections

​DuckDB Connection (src/database/duckdb.ts)

​MySQL Connection (src/database/mysql.ts)

​Sync Service Architecture

ACID Transactions

Watermark Tracking

Schema Evolution

Streaming Batches

​Data Flow

​Storage Structure

​Performance Characteristics

​Query Performance

Columnar Storage

Column Pruning

Compressed Storage

In-Process

​Sync Performance

​Multi-Database Support

​Multi-Instance Pattern

​ACID Guarantees

​Transaction Isolation

​Benefits

Atomicity

Consistency

Durability

No Duplicates

​Error Handling & Recovery

​Automatic Recovery

​Sync Locks

​Monitoring & Observability

​Sync Logs

​Real-Time Progress

​Next Steps

Sequential Appender

Sync Modes

Multi-Database

Build docs developers (and LLMs) love

Overview

Architecture Diagram

Core Components

Server Layer

Database Connections

DuckDB Connection (`src/database/duckdb.ts`)

MySQL Connection (`src/database/mysql.ts`)

Sync Service Architecture

Data Flow

Storage Structure

Performance Characteristics

Query Performance

Sync Performance

Multi-Database Support

Multi-Instance Pattern

ACID Guarantees

Transaction Isolation

Benefits

Error Handling & Recovery

Automatic Recovery

Sync Locks

Monitoring & Observability

Sync Logs

Real-Time Progress

Next Steps