Overview
Cadence is designed for horizontal scalability across all service tiers. This guide covers scaling strategies, capacity planning, and performance tuning for production clusters.Scaling Architecture
Service Tiers
Cadence consists of four independently scalable services:- Frontend - API gateway (stateless)
- History - Workflow state management (sharded)
- Matching - Task list management (sharded)
- Worker - Internal system workflows (lightweight)
History Service Scaling
History Shards
The History service is sharded to distribute workflow ownership across hosts.Shard Configuration
Choosing Number of Shards
The number of shards determines maximum horizontal scalability:- Minimum cluster size: 1 host
- Maximum cluster size: N hosts (where N = numHistoryShards)
- Recommended starting point: 1024-16384 shards
With N shards, you can scale from 1 to N history hosts. Beyond N hosts, additional capacity is wasted as each host requires at least one shard.
Shard Sizing Guidelines
| Cluster Scale | Shards | Notes |
|---|---|---|
| Development | 16-64 | Single host |
| Small Production | 256-1024 | Up to 50 hosts |
| Medium Production | 1024-4096 | 50-200 hosts |
| Large Production | 4096-16384 | 200+ hosts |
Shard Distribution
Shards are distributed using consistent hashing:- New hosts join the ring
- Shard ownership is rebalanced automatically
- In-flight workflows migrate to new owners
- No downtime required
History Service Capacity
Each History host can typically handle:- 50-100 shards per host (depending on workflow complexity)
- 5,000-10,000 workflow executions per second per host
- 100,000-500,000 active workflow executions per host
Matching Service Scaling
Task List Sharding
Matching service shards by task list. Each task list is owned by a single host by default.Scalable Task Lists
For high-throughput task lists, enable partitioning:CLI-Based Partition Configuration
Update partitions via CLI:Partition Selection Algorithms
Three algorithms are available:1. Random Selection (Default)
- Stateless
- Uniform distribution via random selection
- Good for most workloads
2. Round-Robin Selection
- Soft state (cached partition counter)
- Ensures fairness even with low request volumes
- Slightly better distribution than random
3. Weighted Selection
- Selects partitions based on backlog size
- Falls back to round-robin when backlogs are small
- Maximizes poller utilization
Partition Forwarding
Partitions are organized in a tree structure:- Tasks can be forwarded up the tree to parent partitions
- Idle pollers in one partition can service tasks from siblings
- Improves poller utilization
Forwarding Configuration
Matching Service Capacity
Each Matching host can handle:- 100-500 task lists per host
- 10,000-50,000 tasks/second per host
- 1,000-5,000 concurrent pollers per host
Frontend Service Scaling
Stateless Design
Frontend services are completely stateless and can scale linearly.Load Balancing
Use any standard load balancer:- DNS round-robin (simplest)
- Layer 4 load balancer (AWS NLB, GCP Network LB)
- Layer 7 load balancer (AWS ALB, Envoy, Nginx)
Frontend Capacity
Each Frontend host can handle:- 5,000-20,000 requests/second
- 10,000-50,000 concurrent connections
- Limited by network bandwidth and CPU
Rate Limiting
Configure per-domain rate limits:Worker Service Scaling
Worker service runs internal system workflows:- Archival workflows (moving old workflows to long-term storage)
- Domain replication (cross-cluster domain sync)
- System maintenance tasks
Worker Capacity
Typically 2-5 Worker hosts are sufficient for most clusters.Database Scaling
Cassandra Scaling
Cassandra scales horizontally by adding nodes:Cassandra Best Practices
- Use
LOCAL_QUORUMconsistency for multi-DC deployments - Provision 3+ nodes per datacenter
- Monitor partition key distribution (workflow_id based)
- Size nodes for 50-70% disk utilization
MySQL/PostgreSQL Scaling
Single Database
Multiple Database Sharding
For large-scale SQL deployments, shard across multiple databases:Sharding Strategy
- Workflow execution:
dbShardID = historyShardID % numDBShards - History events:
dbShardID = hash(treeID) % numDBShards - Task lists:
dbShardID = hash(domainID + tasklistName) % numDBShards - Visibility:
dbShardID = hash(domainID) % numDBShards - Domain metadata: Always shard 0 (low write volume)
Performance Tuning
History Service
Matching Service
Frontend Service
Capacity Planning
Resource Estimation
Per Active Workflow
- Memory: 1-10 KB (History service)
- Database storage: 10-100 KB (depending on history length)
- CPU: Minimal when idle
Per Workflow Execution/Second
- History service CPU: 0.01-0.1 cores
- Database IOPS: 10-50 reads, 5-10 writes
- Network: 10-100 KB/s
Example Sizing
For 1M active workflows with 1000 executions/second: History Service:- Shards: 4096
- Hosts: 40-80 (50-100 shards per host)
- Memory: 64-128 GB per host
- CPU: 8-16 cores per host
- Hosts: 10-20
- Memory: 32-64 GB per host
- CPU: 8-16 cores per host
- Hosts: 5-10
- Memory: 8-16 GB per host
- CPU: 4-8 cores per host
- Nodes: 9-15 (3 per DC, 3 DCs)
- Storage: 500 GB - 2 TB per node
- Memory: 64-128 GB per node
- CPU: 16-32 cores per node
Auto-Scaling
Metrics-Based Scaling
Scale based on these metrics:Kubernetes HPA Example
Testing Scalability
Load Testing
Use the Cadence bench tool:Monitoring During Scale Events
Monitor these during scaling:- Shard transfer latency (History)
- Task list rebalancing (Matching)
- Database connection pool utilization
- Request error rates
- Task processing latency
See Also
- Monitoring Guide - Metrics and observability
- Configuration Reference - Dynamic configuration
- Persistence Guide - Database setup