Overview
The pipeline includes two optimized configuration templates that represent common deployment scenarios. Templates are JSON files that can be loaded via the--config-template CLI argument.
Edge Template
Optimized for resource-constrained edge devices with limited memory and compute resources. Location:configs/pipeline.edge.template.json
Edge Template Characteristics
Small chunks minimize memory footprint for edge devices
Conservative batch size to avoid memory exhaustion
Single-threaded execution to reduce overhead on limited cores
Strict 256MB memory limit for edge deployment
Throttled to 40% to leave resources for other processes
Fewer benchmark iterations to reduce processing time
Enabled - Critical for handling datasets larger than available RAM
Use Cases
- Raspberry Pi or similar single-board computers
- IoT devices with limited resources
- Mobile or embedded systems
- Environments where memory is <512MB
Server Template
Optimized for high-performance server environments with ample resources. Location:configs/pipeline.server.template.json
Server Template Characteristics
Large chunks maximize throughput on powerful hardware
Large batches leverage vectorization for faster processing
Multi-threaded execution for parallel processing
Generous 4GB memory allocation for complex operations
Full compute resources available (100%)
More iterations for statistically robust benchmarks
Disabled - Keep all data in memory for maximum performance
Use Cases
- Cloud compute instances (AWS, GCP, Azure)
- On-premise data processing servers
- Development workstations
- Environments with >8GB RAM
Using Templates
Load Template via CLI
Override Template Values
CLI arguments take precedence over template values:chunk_size to 128 and n_jobs to 2.
Load Template in Python
Creating Custom Templates
You can create your own templates for specific environments:--config-template path/to/your/template.json.
Template Selection Guide
| Criteria | Edge Template | Server Template |
|---|---|---|
| Available RAM | <512MB | >4GB |
| CPU Cores | 1-2 | 4+ |
| Dataset Size | <100MB | Any size |
| Priority | Resource efficiency | Maximum performance |
| Disk Spilling | Enabled | Disabled |
| Processing Time | Slower, conservative | Faster, aggressive |
Next Steps
Configuration Overview
Learn about all configuration options
CLI Reference
See all command-line arguments