Introduction
Cadence is a distributed, scalable, durable, and highly available orchestration engine designed to execute asynchronous long-running business logic in a resilient way. The system is built on a microservices architecture with clear separation of concerns.High-Level Architecture
Cadence consists of multiple stateless service components, a persistence layer, and optional components for advanced visibility:Service Topology
Cadence employs a horizontally scalable architecture where each service can be scaled independently:Service Distribution
- Frontend Service: Serves as the API gateway, handling all client requests
- History Service: Manages workflow execution state and makes decisions
- Matching Service: Routes workflow tasks and activity tasks to workers
- Worker Service: Handles background operations like replication, indexing, and archival
Sharding Strategy
Cadence uses consistent hashing for distributing workload:- History Shards: Workflow executions are distributed across history shards using
hash(workflowID) % numHistoryShards - Shard Ownership: Each history service instance owns a subset of shards
- Dynamic Rebalancing: Shard ownership automatically rebalances when instances join or leave
The number of history shards (
numHistoryShards) is set at cluster provisioning time and cannot be changed afterward. Choose this value carefully based on expected scale.Data Flow and Communication Patterns
Workflow Execution Flow
Inter-Service Communication
-
Synchronous RPC: Services communicate via gRPC/TChannel
- Frontend → History: Workflow operations
- Frontend → Matching: Task list operations
- History → Matching: Task creation
-
Task-Based Asynchronous: History generates tasks processed asynchronously
- Transfer Tasks: Immediate execution (e.g., decision tasks, activity tasks)
- Timer Tasks: Delayed execution (e.g., workflow timeouts, retries)
- Replication Tasks: Cross-datacenter replication (if enabled)
-
Long Polling: Workers use long polling to receive tasks
- Matching service holds poll requests until tasks are available
- Configurable timeout with automatic retry
Key Design Principles
1. Stateless Services
All Cadence services are stateless, storing no local state:- State is persisted to the database layer
- Services can be stopped/started without data loss
- Horizontal scaling is straightforward
2. Event Sourcing
Workflow state is derived from an immutable event history:- Every workflow execution generates a sequence of history events
- Mutable state is reconstructed from event history
- Enables time travel debugging and replay
3. Shard-Based Partitioning
Workload is distributed using a sharding mechanism:- Each shard is an independent unit of processing
- Shards are owned by history service instances
- Ownership changes trigger graceful shard transfer
4. Multi-Tenancy
Cadence supports multi-tenancy through domains:- Each domain has isolated configuration
- Rate limiting per domain
- Cross-domain workflows not supported
Scalability Characteristics
Vertical Scalability
Horizontal Scalability
| Component | Scaling Limit | Bottleneck |
|---|---|---|
| Frontend | Unlimited | Network bandwidth |
| History | numHistoryShards | Shard count |
| Matching | Unlimited | Task list throughput |
| Worker | Unlimited | Processing capacity |
Performance Considerations
- Throughput: Scales with number of shards and service instances
- Latency: Typically less than 100ms for workflow operations
- Durability: Every state change is persisted before acknowledgment
Deployment Patterns
Single Cluster Deployment
Multi-Region Deployment
See Cross-DC Replication for multi-region setup details.Component Dependencies
Required Components
- Database: Cassandra, MySQL, or PostgreSQL for core persistence
- Ringpop/Membership: For service discovery and shard ownership
Optional Components
- Kafka: For replication tasks and async workflow queues
- Elasticsearch/Pinot: For advanced visibility features
- Archival Storage: For long-term workflow history storage
Configuration Example
Next Steps
Services
Detailed breakdown of each Cadence service
Persistence
Database layer design and configuration
Cross-DC Replication
Multi-region active-active setup