Performance Tuning
This guide covers performance optimization strategies for Mimir AIP deployments, including worker scaling, queue management, storage optimization, and resource allocation.Worker Scaling
Mimir AIP uses Kubernetes Jobs for dynamic worker scaling. The orchestrator monitors queue depth and spawns workers as needed.Configuration
Set worker pool parameters inhelm/mimir-aip/values.yaml:22:
Scaling Logic
The orchestrator implements adaptive scaling:- Queue Monitoring - Checks queue length every polling cycle
- Worker Demand - Calculates needed workers:
ceil(queueLength / queueThreshold) - Active Workers - Counts currently running Kubernetes Jobs
- Spawn Decision - Spawns workers up to
maxWorkerslimit
Priority Queue
Tasks are prioritized by a score based on creation time and priority level:Worker Resource Allocation
Configure resource requests and limits for workers:Autoscaling Strategy
Horizontal Scaling:- Workers scale horizontally based on queue depth
- Each worker processes one task at a time
- Scale from
minWorkerstomaxWorkersdynamically
- Adjust per-worker CPU/memory for compute-intensive tasks
- Monitor resource utilization:
kubectl top pods -n mimir-aip
Queue Management
In-Memory Priority Queue
Mimir uses a heap-based priority queue for task management:Task Retry Logic
Failed tasks are automatically retried with exponential backoff:Queue Monitoring
Monitor queue metrics via API:Queue Optimization
Priority Tuning:- Group similar tasks into single pipeline with batch operations
- Use array data in CIR for bulk storage operations
Pipeline Optimization
Plugin Compilation Caching
Plugins are compiled once and cached:/tmp/plugins on workers
Cache Invalidation: Automatic when commit hash changes
Pipeline Context Size Limits
Context size is capped to prevent memory issues:- Storing large data in external storage (S3, filesystem)
- Passing references instead of raw data
- Cleaning up intermediate results
Step Execution Efficiency
Avoid Redundant Lookups:- Group actions from same plugin together
- Reduces plugin lookup overhead
Storage Performance
Storage Plugin Selection
Use Case Matching:| Use Case | Recommended Plugin | Rationale |
|---|---|---|
| High write throughput | S3, Cassandra | Distributed, scalable |
| Complex queries | PostgreSQL, Neo4j | Rich query capabilities |
| Real-time analytics | Elasticsearch | Fast search and aggregation |
| Low latency | Redis, In-memory | Millisecond response times |
| Cost-sensitive | Filesystem, MinIO | No cloud costs |
Connection Pooling
Storage plugins should implement connection pooling:Batch Operations
Store multiple items in single call:Query Optimization
Index Creation:- Create indexes on frequently queried attributes
- Use storage-specific index types (B-tree, hash, GiST)
- Only retrieve needed attributes (future enhancement)
- Reduces network transfer and memory usage
Storage Schema Design
Denormalization for Read Performance:- Partition large tables by date, region, or other keys
- Improves query performance and maintenance
Resource Optimization
Orchestrator Tuning
Persistence Volume:- SQLite cache: ~500MB for metadata
- Queue overhead: ~1MB per 1000 tasks
- Plugin cache: Negligible (metadata only)
- API endpoints: Lightweight, handle 1000s req/s
- Worker spawning: Occasional spikes during scale-out
Worker Resource Sizing
Pipeline Execution:Plugin Cache Sizing
Plugin .so files are cached on each worker node:/tmp/plugins and /tmp/storage-plugins
Cleanup: Automatic on pod restart
Monitoring and Profiling
Metrics to Track
Queue Metrics:- Queue length over time
- Task wait time (queued → executing)
- Task execution time
- Retry rate
- Active worker count
- Worker spawn rate
- Worker success/failure rate
- Resource utilization per worker
- Operation latency (store, retrieve, update, delete)
- Throughput (operations/second)
- Connection pool utilization
- Error rate
Profiling Tools
Kubernetes Metrics:Multi-Cluster Scaling
For extreme scale, distribute workers across multiple Kubernetes clusters:Configuration
Worker Distribution
Workers overflow to remote clusters when local cluster reaches capacity:- Local cluster - Primary, workers spawn here first
- site-b - Overflow when local reaches
maxWorkers - site-c - Further overflow (if configured)
Network Optimization
- Use regional storage replicas to reduce cross-cluster latency
- Configure orchestratorURL with direct IP for worker callbacks
- Enable worker authentication with
workerAuthToken
Best Practices Summary
-
Worker Scaling
- Set
queueThresholdto 2-5x average task duration (seconds) - Configure
maxWorkersbased on cluster capacity - Use resource limits to prevent node oversubscription
- Set
-
Queue Management
- Use priority for time-sensitive tasks
- Monitor retry rates to detect systemic issues
- Set reasonable
max_retries(default: 3)
-
Pipeline Optimization
- Cache plugin compilations (automatic)
- Minimize context size
- Group operations by plugin
-
Storage Performance
- Choose appropriate storage plugin for workload
- Implement connection pooling
- Use batch operations
- Create indexes on query attributes
-
Resource Allocation
- Size worker resources for task type
- Monitor utilization and adjust
- Use PV with fast storage class for orchestrator
-
Monitoring
- Track queue metrics
- Monitor worker success rate
- Profile slow operations
- Set up alerts for queue depth and error rate