Overview
Duckling uses watermark-based incremental sync to efficiently replicate changes from MySQL to DuckDB. The system automatically syncs data every 15 minutes by default, with zero manual intervention required.Automatic Synchronization
Default Behavior
Synchronization is enabled by default and starts automatically when the server boots:- Sync Interval: Every 15 minutes (
SYNC_INTERVAL_MINUTES=15) - Auto-Start: Enabled by default (
AUTO_START_SYNC=true) - Incremental Mode: Enabled by default (
ENABLE_INCREMENTAL_SYNC=true) - Initial Delay: 5 seconds after server start
The first sync runs 5 seconds after server startup, then repeats every 15 minutes automatically.
Disabling Automatic Sync
To disable automatic synchronization, set the following environment variable:Sync Types
Full Sync
Replaces all data in DuckDB with a fresh copy from MySQL. When to use:- Initial database setup
- After schema changes
- Recovery from data corruption
- Manual data refresh
Incremental Sync
Syncs only records that changed since the last sync using watermark tracking. When to use:- Regular scheduled updates (default)
- Low-latency data replication
- Efficient bandwidth usage
Table-Specific Sync
Sync a single table on demand. API Endpoint:Watermark-Based Incremental Sync
How Watermarks Work
Duckling tracks the last processed timestamp for each table:Detect Timestamp Column
System automatically detects the appropriate timestamp column using priority:
updatedAt/updated_at/modifiedAt/modified_at(highest priority)createdAt/created_at(fallback)timestamp(final fallback)
Timestamp Detection Priority
| Priority | Column Names | Use Case | Behavior |
|---|---|---|---|
| 1 | updatedAt, updated_at, modifiedAt, modified_at | Tables with updates | Captures inserts + updates |
| 2 | createdAt, created_at | Append-only tables | Captures new records only |
| 3 | timestamp | Legacy systems | Generic timestamp tracking |
The system uses
>= (not >) in watermark queries to prevent data loss at timestamp boundaries. This means the last record is re-processed each sync, but INSERT OR REPLACE handles this idempotently.INSERT OR REPLACE Behavior
Incremental sync usesINSERT OR REPLACE for automatic upsert:
- If primary key exists: REPLACE entire row (update)
- If primary key is new: INSERT new row
- Result: No duplicates, updates propagate automatically
- No duplicates (primary key constraint enforced)
- Updates propagate automatically
- Idempotent (safe to re-process records)
- Works with
>=operator for boundary safety
Sync Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
SYNC_INTERVAL_MINUTES | 15 | Minutes between automatic syncs |
AUTO_START_SYNC | true | Auto-start sync on server boot |
ENABLE_INCREMENTAL_SYNC | true | Enable incremental mode |
BATCH_SIZE | 1000 | Records per batch from MySQL |
INSERT_BATCH_SIZE | 2000 | Records per INSERT batch |
APPENDER_FLUSH_INTERVAL | 5000 | Appender flush interval (ms) |
MAX_RETRIES | 3 | Retry attempts for failed operations |
EXCLUDED_TABLES | "" | Comma-separated list of tables to exclude |
Excluding Tables
To exclude specific tables from synchronization:Monitoring Sync Operations
Sync Status Endpoint
Get current sync state and recent activity:Sync Logs
View detailed sync history:Sync Log Schema
Each sync operation is logged with:Validation
Compare record counts between MySQL and DuckDB:Performance Optimization
See Performance Tuning for detailed optimization strategies including:- Batch size tuning
- Worker thread configuration
- Network optimization
- Query performance
Troubleshooting
Sync Not Running
Sync Errors
Check sync logs for errors:- Connection timeout: Check MySQL network connectivity
- Schema mismatch: Run full sync to refresh schema
- Lock timeout: Reduce batch size or sync during off-peak hours
Data Mismatches
If validation shows mismatches:- Check if tables are excluded:
EXCLUDED_TABLESenvironment variable - Review sync logs for the affected table
- Check for active transactions in MySQL
- Run full sync for affected tables
Multi-Database Support
All sync endpoints support the?db={database_id} parameter:
Each database has its own sync schedule and watermarks. Multiple databases sync independently on staggered intervals to prevent resource contention.
Next Steps
- Monitoring - Set up health checks and metrics
- Backups - Configure automated backups
- Performance Tuning - Optimize sync performance