Skip to main content

Introduction

Cadence is a distributed, scalable, durable, and highly available orchestration engine designed to execute asynchronous long-running business logic in a resilient way. The system is built on a microservices architecture with clear separation of concerns.

High-Level Architecture

Cadence consists of multiple stateless service components, a persistence layer, and optional components for advanced visibility:

Service Topology

Cadence employs a horizontally scalable architecture where each service can be scaled independently:

Service Distribution

  • Frontend Service: Serves as the API gateway, handling all client requests
  • History Service: Manages workflow execution state and makes decisions
  • Matching Service: Routes workflow tasks and activity tasks to workers
  • Worker Service: Handles background operations like replication, indexing, and archival

Sharding Strategy

Cadence uses consistent hashing for distributing workload:
persistence:
  numHistoryShards: 1024  # Number of history shards
  • History Shards: Workflow executions are distributed across history shards using hash(workflowID) % numHistoryShards
  • Shard Ownership: Each history service instance owns a subset of shards
  • Dynamic Rebalancing: Shard ownership automatically rebalances when instances join or leave
The number of history shards (numHistoryShards) is set at cluster provisioning time and cannot be changed afterward. Choose this value carefully based on expected scale.

Data Flow and Communication Patterns

Workflow Execution Flow

Inter-Service Communication

  1. Synchronous RPC: Services communicate via gRPC/TChannel
    • Frontend → History: Workflow operations
    • Frontend → Matching: Task list operations
    • History → Matching: Task creation
  2. Task-Based Asynchronous: History generates tasks processed asynchronously
    • Transfer Tasks: Immediate execution (e.g., decision tasks, activity tasks)
    • Timer Tasks: Delayed execution (e.g., workflow timeouts, retries)
    • Replication Tasks: Cross-datacenter replication (if enabled)
  3. Long Polling: Workers use long polling to receive tasks
    • Matching service holds poll requests until tasks are available
    • Configurable timeout with automatic retry

Key Design Principles

1. Stateless Services

All Cadence services are stateless, storing no local state:
  • State is persisted to the database layer
  • Services can be stopped/started without data loss
  • Horizontal scaling is straightforward

2. Event Sourcing

Workflow state is derived from an immutable event history:
  • Every workflow execution generates a sequence of history events
  • Mutable state is reconstructed from event history
  • Enables time travel debugging and replay

3. Shard-Based Partitioning

Workload is distributed using a sharding mechanism:
  • Each shard is an independent unit of processing
  • Shards are owned by history service instances
  • Ownership changes trigger graceful shard transfer

4. Multi-Tenancy

Cadence supports multi-tenancy through domains:
  • Each domain has isolated configuration
  • Rate limiting per domain
  • Cross-domain workflows not supported

Scalability Characteristics

Vertical Scalability

services:
  history:
    rpc:
      grpcMaxMsgSize: 33554432  # 32MB
  frontend:
    rpc:
      grpcMaxMsgSize: 33554432

Horizontal Scalability

ComponentScaling LimitBottleneck
FrontendUnlimitedNetwork bandwidth
HistorynumHistoryShardsShard count
MatchingUnlimitedTask list throughput
WorkerUnlimitedProcessing capacity

Performance Considerations

  • Throughput: Scales with number of shards and service instances
  • Latency: Typically less than 100ms for workflow operations
  • Durability: Every state change is persisted before acknowledgment

Deployment Patterns

Single Cluster Deployment

services:
  frontend:
    rpc:
      port: 7933
      grpcPort: 7833
  matching:
    rpc:
      port: 7935
      grpcPort: 7835
  history:
    rpc:
      port: 7934
      grpcPort: 7834
  worker:
    rpc:
      port: 7939

Multi-Region Deployment

See Cross-DC Replication for multi-region setup details.

Component Dependencies

Required Components

  • Database: Cassandra, MySQL, or PostgreSQL for core persistence
  • Ringpop/Membership: For service discovery and shard ownership

Optional Components

  • Kafka: For replication tasks and async workflow queues
  • Elasticsearch/Pinot: For advanced visibility features
  • Archival Storage: For long-term workflow history storage

Configuration Example

persistence:
  defaultStore: cass-default
  visibilityStore: cass-visibility
  numHistoryShards: 1024
  datastores:
    cass-default:
      nosql:
        pluginName: "cassandra"
        hosts: "127.0.0.1"
        keyspace: "cadence"
        consistency: LOCAL_QUORUM

ringpop:
  name: cadence
  bootstrapMode: hosts
  bootstrapHosts: ["127.0.0.1:7933", "127.0.0.1:7934", "127.0.0.1:7935"]
  maxJoinDuration: 30s

clusterGroupMetadata:
  failoverVersionIncrement: 10
  primaryClusterName: "cluster0"
  currentClusterName: "cluster0"
  clusterGroup:
    cluster0:
      enabled: true
      initialFailoverVersion: 0
      rpcAddress: "localhost:7833"
      rpcTransport: "grpc"

Next Steps

Services

Detailed breakdown of each Cadence service

Persistence

Database layer design and configuration

Cross-DC Replication

Multi-region active-active setup

Build docs developers (and LLMs) love