Scaling Cadence

Overview

Cadence is designed for horizontal scalability across all service tiers. This guide covers scaling strategies, capacity planning, and performance tuning for production clusters.

Scaling Architecture

Service Tiers

Cadence consists of four independently scalable services:

Frontend - API gateway (stateless)
History - Workflow state management (sharded)
Matching - Task list management (sharded)
Worker - Internal system workflows (lightweight)

History Service Scaling

History Shards

The History service is sharded to distribute workflow ownership across hosts.

The number of history shards (numHistoryShards) is set at cluster provisioning and cannot be changed after initialization.

Shard Configuration

persistence:
  numHistoryShards: 16384  # Choose wisely - cannot be changed later
  defaultStore: "default"
  visibilityStore: "visibility"
  datastores:
    default:
      cassandra:
        hosts: "127.0.0.1"
        keyspace: "cadence"

Choosing Number of Shards

The number of shards determines maximum horizontal scalability:

Minimum cluster size: 1 host
Maximum cluster size: N hosts (where N = numHistoryShards)
Recommended starting point: 1024-16384 shards

With N shards, you can scale from 1 to N history hosts. Beyond N hosts, additional capacity is wasted as each host requires at least one shard.

Shard Sizing Guidelines

Cluster Scale	Shards	Notes
Development	16-64	Single host
Small Production	256-1024	Up to 50 hosts
Medium Production	1024-4096	50-200 hosts
Large Production	4096-16384	200+ hosts

Shard Distribution

Shards are distributed using consistent hashing:

historyShardID = hash(workflowID) % numHistoryShards

Each History host owns a subset of shards. When scaling:

New hosts join the ring
Shard ownership is rebalanced automatically
In-flight workflows migrate to new owners
No downtime required

History Service Capacity

Each History host can typically handle:

50-100 shards per host (depending on workflow complexity)
5,000-10,000 workflow executions per second per host
100,000-500,000 active workflow executions per host

Monitor shard-level metrics to identify hot shards. Consider domain-based isolation for high-throughput workflows.

Matching Service Scaling

Task List Sharding

Matching service shards by task list. Each task list is owned by a single host by default.

Scalable Task Lists

For high-throughput task lists, enable partitioning:

# Dynamic config
matching.numTasklistWritePartitions:
  - value: 10
    constraints:
      domainName: "high-throughput-domain"
      taskListName: "my-tasklist"

matching.numTasklistReadPartitions:
  - value: 10
    constraints:
      domainName: "high-throughput-domain"
      taskListName: "my-tasklist"

CLI-Based Partition Configuration

Update partitions via CLI:

# Update task list partitions
cadence admin tasklist update-partition \
  --domain my-domain \
  --tasklist my-tasklist \
  --tasklist_type decision \
  --partitions 10

# Describe current configuration
cadence admin tasklist describe \
  --domain my-domain \
  --tasklist my-tasklist \
  --tasklist_type decision

Partition Selection Algorithms

Three algorithms are available:

1. Random Selection (Default)

Stateless
Uniform distribution via random selection
Good for most workloads

2. Round-Robin Selection

Soft state (cached partition counter)
Ensures fairness even with low request volumes
Slightly better distribution than random

3. Weighted Selection

Selects partitions based on backlog size
Falls back to round-robin when backlogs are small
Maximizes poller utilization

Partition Forwarding

Partitions are organized in a tree structure:

                [Root]
               /      \
           [P1]      [P2]
          /    \    /    \
        [P3]  [P4][P5]  [P6]

Tasks can be forwarded up the tree to parent partitions
Idle pollers in one partition can service tasks from siblings
Improves poller utilization

Forwarding Configuration

# Dynamic config
matching.forwarderMaxChildrenPerNode: 2      # Binary tree
matching.forwarderMaxOutstandingPolls: 100   # Max forwarded polls
matching.forwarderMaxOutstandingTasks: 100   # Max forwarded tasks
matching.forwarderMaxRatePerSecond: 1000     # Rate limit

Matching Service Capacity

Each Matching host can handle:

100-500 task lists per host
10,000-50,000 tasks/second per host
1,000-5,000 concurrent pollers per host

Frontend Service Scaling

Stateless Design

Frontend services are completely stateless and can scale linearly.

Load Balancing

Use any standard load balancer:

DNS round-robin (simplest)
Layer 4 load balancer (AWS NLB, GCP Network LB)
Layer 7 load balancer (AWS ALB, Envoy, Nginx)

Frontend Capacity

Each Frontend host can handle:

5,000-20,000 requests/second
10,000-50,000 concurrent connections
Limited by network bandwidth and CPU

Rate Limiting

Configure per-domain rate limits:

frontend.rps:
  - value: 10000
    constraints:
      domainName: "my-domain"

Worker Service Scaling

Worker service runs internal system workflows:

Archival workflows (moving old workflows to long-term storage)
Domain replication (cross-cluster domain sync)
System maintenance tasks

Worker Capacity

Typically 2-5 Worker hosts are sufficient for most clusters.

Database Scaling

Cassandra Scaling

Cassandra scales horizontally by adding nodes:

persistence:
  datastores:
    default:
      cassandra:
        hosts: "host1,host2,host3,host4"
        keyspace: "cadence"
        maxConns: 20
        consistency: "local_quorum"

Cassandra Best Practices

Use LOCAL_QUORUM consistency for multi-DC deployments
Provision 3+ nodes per datacenter
Monitor partition key distribution (workflow_id based)
Size nodes for 50-70% disk utilization

MySQL/PostgreSQL Scaling

Single Database

persistence:
  datastores:
    default:
      sql:
        pluginName: "mysql"
        databaseName: "cadence"
        connectAddr: "mysql-host:3306"
        connectProtocol: "tcp"
        maxConns: 100
        maxIdleConns: 20

Multiple Database Sharding

For large-scale SQL deployments, shard across multiple databases:

persistence:
  datastores:
    default:
      sql:
        pluginName: "mysql"
        connectProtocol: "tcp"
        useMultipleDatabases: true
        nShards: 4
        multipleDatabasesConfig:
          - databaseName: "cadence0"
            connectAddr: "mysql1:3306"
            user: "cadence"
            password: "${MYSQL_PASSWORD}"
          - databaseName: "cadence1"
            connectAddr: "mysql2:3306"
            user: "cadence"
            password: "${MYSQL_PASSWORD}"
          - databaseName: "cadence2"
            connectAddr: "mysql3:3306"
            user: "cadence"
            password: "${MYSQL_PASSWORD}"
          - databaseName: "cadence3"
            connectAddr: "mysql4:3306"
            user: "cadence"
            password: "${MYSQL_PASSWORD}"

Sharding Strategy

Workflow execution: dbShardID = historyShardID % numDBShards
History events: dbShardID = hash(treeID) % numDBShards
Task lists: dbShardID = hash(domainID + tasklistName) % numDBShards
Visibility: dbShardID = hash(domainID) % numDBShards
Domain metadata: Always shard 0 (low write volume)

Advanced visibility (Elasticsearch/Pinot) is required when using multiple SQL databases due to visibility sharding limitations.

Performance Tuning

History Service

history:
  # Cache tuning
  historyCache:
    initialSize: 128
    maxSize: 512
    ttl: "1h"
  
  # Execution cache sizing
  enableSizeBasedHistoryExecutionCache: true
  executionCacheMaxByteSize: 536870912  # 512MB
  
  # Events cache
  eventsCache:
    initialCount: 128
    maxCount: 512
    maxSize: 67108864  # 64MB
    ttl: "1h"
  
  # Task processing
  taskProcessRPS: 10000
  taskSchedulerWorkerCount: 512

Matching Service

matching:
  # Task buffer
  longPollExpirationInterval: "60s"
  
  # Sync matching
  enableSyncMatch: true
  
  # Rate limiting
  maxTaskListPollerRate: 1000

Frontend Service

frontend:
  # Rate limiting
  rps: 20000
  
  # Persistence rate limiting
  persistenceMaxQPS: 10000
  persistenceGlobalMaxQPS: 100000
  
  # History service max QPS
  historyMaxQPS: 50000

Capacity Planning

Resource Estimation

Per Active Workflow

Memory: 1-10 KB (History service)
Database storage: 10-100 KB (depending on history length)
CPU: Minimal when idle

Per Workflow Execution/Second

History service CPU: 0.01-0.1 cores
Database IOPS: 10-50 reads, 5-10 writes
Network: 10-100 KB/s

Example Sizing

For 1M active workflows with 1000 executions/second: History Service:

Shards: 4096
Hosts: 40-80 (50-100 shards per host)
Memory: 64-128 GB per host
CPU: 8-16 cores per host

Matching Service:

Hosts: 10-20
Memory: 32-64 GB per host
CPU: 8-16 cores per host

Frontend Service:

Hosts: 5-10
Memory: 8-16 GB per host
CPU: 4-8 cores per host

Database (Cassandra):

Nodes: 9-15 (3 per DC, 3 DCs)
Storage: 500 GB - 2 TB per node
Memory: 64-128 GB per node
CPU: 16-32 cores per node

Auto-Scaling

Metrics-Based Scaling

Scale based on these metrics:

# History service - scale on CPU or active workflows
rate(cadence_workflow_started[5m]) > 1000
avg(cadence_memory_heapinuse) > 8e9

# Matching service - scale on task list backlog
cadence_matching_tasklist_backlog > 1000

# Frontend service - scale on request rate
rate(cadence_frontend_client_requests[5m]) > 10000

Kubernetes HPA Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cadence-history
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cadence-history
  minReplicas: 10
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: cadence_workflow_started
      target:
        type: AverageValue
        averageValue: "1000"

Testing Scalability

Load Testing

Use the Cadence bench tool:

cadence-bench -mode workflow \
  -domain my-domain \
  -tasklist my-tasklist \
  -concurrency 1000 \
  -rate 100 \
  -duration 1h

Monitoring During Scale Events

Monitor these during scaling:

Shard transfer latency (History)
Task list rebalancing (Matching)
Database connection pool utilization
Request error rates
Task processing latency

Get Started

Core Concepts

Architecture

Deployment

Operations

Client SDKs

​Overview

​Scaling Architecture

​Service Tiers

​History Service Scaling

​History Shards

​Shard Configuration

​Choosing Number of Shards

​Shard Sizing Guidelines

​Shard Distribution

​History Service Capacity

​Matching Service Scaling

​Task List Sharding

​Scalable Task Lists

​CLI-Based Partition Configuration

​Partition Selection Algorithms

​1. Random Selection (Default)

​2. Round-Robin Selection

​3. Weighted Selection

​Partition Forwarding

​Forwarding Configuration

​Matching Service Capacity

​Frontend Service Scaling

​Stateless Design

​Load Balancing

​Frontend Capacity

​Rate Limiting

​Worker Service Scaling

​Worker Capacity

​Database Scaling

​Cassandra Scaling

​Cassandra Best Practices

​MySQL/PostgreSQL Scaling

​Single Database

​Multiple Database Sharding

​Sharding Strategy

​Performance Tuning

​History Service

​Matching Service

​Frontend Service

​Capacity Planning

​Resource Estimation

​Per Active Workflow

​Per Workflow Execution/Second

​Example Sizing

​Auto-Scaling

​Metrics-Based Scaling

​Kubernetes HPA Example

​Testing Scalability

​Load Testing

​Monitoring During Scale Events

​See Also

Build docs developers (and LLMs) love

Overview

Scaling Architecture

Service Tiers

History Service Scaling

History Shards

Shard Configuration

Choosing Number of Shards

Shard Sizing Guidelines

Shard Distribution

History Service Capacity

Matching Service Scaling

Task List Sharding

Scalable Task Lists

CLI-Based Partition Configuration

Partition Selection Algorithms

1. Random Selection (Default)

2. Round-Robin Selection

3. Weighted Selection

Partition Forwarding

Forwarding Configuration

Matching Service Capacity

Frontend Service Scaling

Stateless Design

Load Balancing

Frontend Capacity

Rate Limiting

Worker Service Scaling

Worker Capacity

Database Scaling

Cassandra Scaling

Cassandra Best Practices

MySQL/PostgreSQL Scaling

Single Database

Multiple Database Sharding

Sharding Strategy

Performance Tuning

History Service

Matching Service

Frontend Service

Capacity Planning

Resource Estimation

Per Active Workflow

Per Workflow Execution/Second

Example Sizing

Auto-Scaling

Metrics-Based Scaling

Kubernetes HPA Example

Testing Scalability

Load Testing

Monitoring During Scale Events

See Also