Architecture for Scaling
Temporal consists of four independently scalable services:- Frontend - API gateway, request routing
- History - Workflow state management, sharded by workflow ID
- Matching - Task queue management, poll handling
- Worker - System workflows (archival, replication, etc.)
Horizontal Scaling
Frontend Service
Scale based on API request rate:service_requests{service_role="frontend"}rateservice_latency{service_role="frontend"}p99 > targetservice_pending_requests{service_role="frontend"}> threshold
History Service
The most resource-intensive service. Scale based on shard count and load:- Each history node owns a subset of shards
- Shards are distributed via consistent hashing
- Minimum: 1 history node
- Recommended: Number of nodes ≤ numHistoryShards / 4
ShardControllerlatencyUpdateWorkflowExecutionlatency- CPU utilization > 70%
- Memory utilization > 80%
Matching Service
Scale based on task queue throughput:PollWorkflowTaskQueuelatencyPollActivityTaskQueuelatency- Task dispatch latency
- Number of unique task queues
Worker Service
Scale based on system workflow load:- Archival backlog
- Replication lag (multi-cluster)
- System workflow queue size
Vertical Scaling
CPU Resources
History Service:- Minimum: 4 cores
- Recommended: 8-16 cores
- High throughput: 32+ cores
- Minimum: 2 cores
- Recommended: 4-8 cores
Memory Resources
History Service:- Minimum: 2 GB
- Recommended: 4-8 GB
- Minimum: 2 GB
- Recommended: 4-8 GB
- Add 10 MB per 1000 active task queues
Persistence Layer Scaling
Cassandra
Recommended Configuration:- 3-5 node minimum for production
- Replication factor: 3
- Add nodes when CPU > 70% or disk > 70%
- Use separate clusters for default and visibility stores
PostgreSQL/MySQL
Connection Pool Configuration:- Vertical scaling (increase instance size)
- Read replicas (not recommended for Temporal)
- Sharding (Vitess for MySQL)
Elasticsearch (Visibility)
Recommended Configuration:- Start with 5 shards
- Increase to 10-20 for > 100M workflows
- 1 replica minimum for production
- Query latency > 1s p99
- Index rate > 10000/sec
- Disk usage > 85%
Dynamic Configuration
Tune performance without server restart:Dynamic Config Polling
Caching Configuration
History Cache
Cache workflow execution state:Events Cache
Cache workflow history events:Workflow Limits
Enforce limits to prevent resource exhaustion:Network Optimization
gRPC Keep-Alive
Connection Limits
Monitoring for Scaling
Key Metrics
Throughput:Scaling Decision Matrix
| Symptom | Scale | Configuration |
|---|---|---|
| High frontend latency | Frontend nodes | Add instances |
| High persistence latency | Database | Vertical scale or add nodes |
| Shard ownership changes | History nodes | Add instances |
| Task dispatch delays | Matching nodes | Add instances |
| CPU > 80% | Service nodes | Add instances or increase CPU |
| Memory > 85% | Service nodes | Add instances or increase memory |
| High cache misses | History cache | Increase cacheMaxSize |
| DLQ messages | Worker nodes | Add instances |
Performance Best Practices
1. Shard Count Planning
Choose shard count at deployment time:2. Task Queue Design
- Use fewer, busier task queues vs. many idle queues
- Limit to < 10,000 unique task queues per cluster
- Use task queue routing for versioning
3. History Growth
- Use Continue-As-New for long-running workflows
- Keep workflow histories < 50KB when possible
- Monitor
history_sizeandhistory_countmetrics
4. Namespace Organization
- Separate high and low priority workloads
- Use namespaces for isolation and rate limiting
- Monitor per-namespace metrics
5. Batch Operations
- Use batch API operations where available
- Reduce individual RPC calls
- Bundle signal sends when possible
6. Client Connection Pooling
In SDK clients:Load Testing
Benchmarking Setup
- Start with baseline load
- Gradually increase by 20% every 30 minutes
- Monitor all metrics
- Identify bottlenecks
- Scale and repeat
Test Scenarios
Scenario 1: High Throughput- Metric: Workflow starts/sec
- Target: Saturate frontend/history
- Metric: Concurrent workflow executions
- Target: Saturate history service memory
- Metric: Task polls/sec
- Target: Saturate matching service
- Metric: History size
- Target: Test persistence limits
Troubleshooting Performance
High Latency
- Check persistence latency
- Review cache hit rates
- Verify network connectivity
- Check for lock contention
Memory Issues
- Reduce cache sizes
- Decrease workflow retention
- Add more history nodes
- Review workflow history sizes
CPU Saturation
- Add service instances
- Optimize workflow code
- Reduce task processing frequency
- Check for inefficient queries
See Also
- Monitoring - Metrics for scaling decisions
- Persistence - Database tuning
- Metrics Reference - Complete metric list