Overview
Duckling provides comprehensive monitoring capabilities including health checks, system metrics, query metrics, and automated health monitoring with auto-restart.Health Checks
Health Endpoint
Basic health check for database connectivity:CLI Health Check
Status Endpoint
Detailed system status with table counts and metrics:Automatic Health Monitoring
Auto-Restart Service
Duckling includes automatic health monitoring and recovery:- Health Check Interval: Every 60 seconds
- Auto-Restart: Enabled by default (
AUTO_RESTART=true) - Max Restart Attempts: 3 attempts (
MAX_RESTART_ATTEMPTS=3) - Recovery Strategy: Exponential backoff
The auto-restart service monitors DuckDB and MySQL connections, automatically recovering from failures without manual intervention.
Health Monitoring Checks
The system performs three health checks every 60 seconds:Recovery Process
When health check fails:- Attempt 1: Test connections, trigger recovery sync
- Wait: Exponential backoff (2s, 4s, 8s, max 60s)
- Attempt 2: Retry connection tests and sync
- Wait: Longer backoff
- Attempt 3: Final retry attempt
- Failure: Log critical error, manual intervention required
Disabling Auto-Restart
To disable automatic recovery:System Metrics
Metrics Endpoint
Get comprehensive system and query metrics:System Metrics
System metrics are collected every 30 seconds:| Metric | Description | Unit |
|---|---|---|
cpuPercent | Node.js process CPU usage | Percentage |
rssMB | Resident Set Size (total memory) | MB |
heapUsedMB | V8 heap memory usage | MB |
hostFreeMemMB | Available system memory | MB |
hostTotalMemMB | Total system memory | MB |
eventLoopLagMs | Event loop delay | Milliseconds |
- Retention: Last 61 samples (30.5 minutes)
- Sample Interval: 30 seconds
- Buffer: Rolling window, oldest samples dropped
The system metrics service starts automatically with the server. Historical data is kept in-memory for lightweight monitoring.
Query Metrics
Query metrics track all DuckDB queries: Active Queries:- Currently executing queries
- SQL statement (truncated to 200 chars)
- Start timestamp
- Running duration
- Database ID
- Normalized SQL (literals replaced with
?) - Execution count
- Average/min/max duration
- Last execution timestamp
- Top 100 patterns by frequency
Query patterns use LRU (Least Recently Used) eviction with a maximum of 1,000 patterns tracked. This prevents memory growth on databases with high query variety.
Automation Status
Automation Endpoint
Get status of all automation services:Automation Configuration
| Variable | Default | Description |
|---|---|---|
AUTO_START_SYNC | true | Enable automatic sync |
AUTO_CLEANUP | true | Enable automatic cleanup |
AUTO_BACKUP | true | Enable automatic backups |
AUTO_RESTART | true | Enable health monitoring and auto-restart |
MAX_RESTART_ATTEMPTS | 3 | Maximum recovery attempts |
Log Monitoring
Server Logs
View real-time server logs:Log Levels
Configure log verbosity withLOG_LEVEL:
Structured Logging
Duckling uses Winston for structured logging:Alerting
Health Check Monitoring
Integrate health checks with monitoring tools: Prometheus:Critical Alerts
Monitor these conditions:- Health endpoint returns unhealthy (5xx status)
- Sync hasn’t run in 30+ minutes (check automation status)
- Restart attempts >= 3 (check automation status)
- Event loop lag > 100ms (check system metrics)
- Memory usage > 80% (check system metrics)
Performance Metrics
Sync Performance
Track sync operations from sync logs:- Records processed per sync
- Duration per table
- Success/error rate
- Watermark progression
Query Performance
Monitor slow queries from query patterns:- High average duration
- High max duration
- High execution count
Dashboard Integration
Web Dashboard
Access the built-in dashboard:- Real-time system metrics graphs
- Active queries view
- Query pattern analysis
- Sync log history
- Automation status
API Integration
Build custom dashboards using API endpoints:Multi-Database Monitoring
Monitor multiple databases:Each database has independent health status, metrics, and automation services. Monitor them separately for accurate observability.
Troubleshooting
High CPU Usage
- Check active queries:
GET /metrics - Review query patterns for inefficient queries
- Reduce concurrent query load
- Consider query optimization
High Memory Usage
- Check system metrics:
GET /metrics - Review batch sizes:
BATCH_SIZE,INSERT_BATCH_SIZE - Reduce connection pool sizes
- Monitor for memory leaks in logs
Event Loop Lag
- Check system metrics for
eventLoopLagMs - Reduce concurrent operations
- Increase worker threads:
WORKER_THREADS - Review long-running queries
Next Steps
- Synchronization - Configure sync operations
- Backups - Set up backup automation
- Performance Tuning - Optimize performance