When to Use DuckDB
- Analytical workloads: Complex aggregations, window functions, joins
- Larger datasets: Use file mode for data that doesn’t fit in memory
- Advanced SQL: Requires DuckDB-specific features
- Single-file storage: Dataset can fit in one DuckDB file (<100GB typical)
Configuration
Memory Mode
Data stored in RAM (volatile):File Mode
Data persisted to disk:Default File Location
Ifduckdb_file not specified, uses {spice_data_dir}/accelerated_duckdb.db:
File Create Mode
Deletes existing file on startup:Refresh Modes
Full Refresh
Replaces all data:Append Mode
Appends only new data based on time column:Snapshots
Bootstrap from S3 snapshots for fast cold starts:Performance Features
Preserve Insertion Order
Maintains insertion order using internal tables:On-Refresh Sorting
Sort data after each refresh for better query performance:Recompute Statistics
Recompute statistics after writes for optimal query plans:Index Configuration
Connection Pooling
Pool Size
Configure connection pool size:Memory Limit
Set per-connection memory limit:Federation
DuckDB accelerators can federate queries across multiple file-mode DuckDB databases. Spice automatically attaches other DuckDB databases.Retention SQL
Delete old data automatically on each refresh:Primary Keys and Constraints
Primary Key
Composite Primary Key
On Conflict Behavior
File Extensions
DuckDB supports multiple file extensions:.db.ddb.duckdb
Performance Characteristics
Query Performance
| Operation | Performance | Notes |
|---|---|---|
| Full table scan | Excellent | Columnar storage, SIMD |
| Aggregations | Excellent | Optimized group-by |
| Window functions | Excellent | Native support |
| Joins | Excellent | Hash and merge joins |
| Point queries | Good | Better with indexes |
Storage
| Feature | Details |
|---|---|
| Format | Columnar (compressed) |
| Compression | Automatic (zstd, dictionary, RLE) |
| Ratio | Typically 5-10x compression |
| Max file size | ~100GB typical, larger possible |
Partitioning
DuckDB supports table partitioning:Example Configurations
Time-Series Analytics
Large Dataset with Snapshot
In-Memory Analytics
Monitoring
Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
| duckdb_file | string | Path to DuckDB file | auto |
| memory_limit | string | Per-connection memory limit (e.g., “4GB”) | - |
| connection_pool_size | integer | Max connections in pool | 10 |
| preserve_insertion_order | boolean | Maintain row insertion order | false |
| on_refresh_sort_columns | string | Sort columns after refresh (e.g., “time DESC”) | - |
| on_refresh_recompute_statistics | boolean | Recompute stats after writes | false |
| index_scan_percentage | float | Selectivity threshold for index use | 0.6 |
| index_scan_max_count | integer | Max rows for index scan | 1000000 |
Limitations
- Single-file storage (partition for larger datasets)
- File mode requires local disk
- No built-in replication (use snapshots for backup)
Next Steps
- Cayenne Accelerator - For multi-file append-heavy workloads
- Arrow Accelerator - For in-memory speed
- Snapshots Documentation