Skip to main content

Overview

Cadence is designed for horizontal scalability across all service tiers. This guide covers scaling strategies, capacity planning, and performance tuning for production clusters.

Scaling Architecture

Service Tiers

Cadence consists of four independently scalable services:
  1. Frontend - API gateway (stateless)
  2. History - Workflow state management (sharded)
  3. Matching - Task list management (sharded)
  4. Worker - Internal system workflows (lightweight)

History Service Scaling

History Shards

The History service is sharded to distribute workflow ownership across hosts.
The number of history shards (numHistoryShards) is set at cluster provisioning and cannot be changed after initialization.

Shard Configuration

persistence:
  numHistoryShards: 16384  # Choose wisely - cannot be changed later
  defaultStore: "default"
  visibilityStore: "visibility"
  datastores:
    default:
      cassandra:
        hosts: "127.0.0.1"
        keyspace: "cadence"

Choosing Number of Shards

The number of shards determines maximum horizontal scalability:
  • Minimum cluster size: 1 host
  • Maximum cluster size: N hosts (where N = numHistoryShards)
  • Recommended starting point: 1024-16384 shards
With N shards, you can scale from 1 to N history hosts. Beyond N hosts, additional capacity is wasted as each host requires at least one shard.

Shard Sizing Guidelines

Cluster ScaleShardsNotes
Development16-64Single host
Small Production256-1024Up to 50 hosts
Medium Production1024-409650-200 hosts
Large Production4096-16384200+ hosts

Shard Distribution

Shards are distributed using consistent hashing:
historyShardID = hash(workflowID) % numHistoryShards
Each History host owns a subset of shards. When scaling:
  1. New hosts join the ring
  2. Shard ownership is rebalanced automatically
  3. In-flight workflows migrate to new owners
  4. No downtime required

History Service Capacity

Each History host can typically handle:
  • 50-100 shards per host (depending on workflow complexity)
  • 5,000-10,000 workflow executions per second per host
  • 100,000-500,000 active workflow executions per host
Monitor shard-level metrics to identify hot shards. Consider domain-based isolation for high-throughput workflows.

Matching Service Scaling

Task List Sharding

Matching service shards by task list. Each task list is owned by a single host by default.

Scalable Task Lists

For high-throughput task lists, enable partitioning:
# Dynamic config
matching.numTasklistWritePartitions:
  - value: 10
    constraints:
      domainName: "high-throughput-domain"
      taskListName: "my-tasklist"

matching.numTasklistReadPartitions:
  - value: 10
    constraints:
      domainName: "high-throughput-domain"
      taskListName: "my-tasklist"

CLI-Based Partition Configuration

Update partitions via CLI:
# Update task list partitions
cadence admin tasklist update-partition \
  --domain my-domain \
  --tasklist my-tasklist \
  --tasklist_type decision \
  --partitions 10

# Describe current configuration
cadence admin tasklist describe \
  --domain my-domain \
  --tasklist my-tasklist \
  --tasklist_type decision

Partition Selection Algorithms

Three algorithms are available:

1. Random Selection (Default)

  • Stateless
  • Uniform distribution via random selection
  • Good for most workloads

2. Round-Robin Selection

  • Soft state (cached partition counter)
  • Ensures fairness even with low request volumes
  • Slightly better distribution than random

3. Weighted Selection

  • Selects partitions based on backlog size
  • Falls back to round-robin when backlogs are small
  • Maximizes poller utilization

Partition Forwarding

Partitions are organized in a tree structure:
                [Root]
               /      \
           [P1]      [P2]
          /    \    /    \
        [P3]  [P4][P5]  [P6]
  • Tasks can be forwarded up the tree to parent partitions
  • Idle pollers in one partition can service tasks from siblings
  • Improves poller utilization

Forwarding Configuration

# Dynamic config
matching.forwarderMaxChildrenPerNode: 2      # Binary tree
matching.forwarderMaxOutstandingPolls: 100   # Max forwarded polls
matching.forwarderMaxOutstandingTasks: 100   # Max forwarded tasks
matching.forwarderMaxRatePerSecond: 1000     # Rate limit

Matching Service Capacity

Each Matching host can handle:
  • 100-500 task lists per host
  • 10,000-50,000 tasks/second per host
  • 1,000-5,000 concurrent pollers per host

Frontend Service Scaling

Stateless Design

Frontend services are completely stateless and can scale linearly.

Load Balancing

Use any standard load balancer:
  • DNS round-robin (simplest)
  • Layer 4 load balancer (AWS NLB, GCP Network LB)
  • Layer 7 load balancer (AWS ALB, Envoy, Nginx)

Frontend Capacity

Each Frontend host can handle:
  • 5,000-20,000 requests/second
  • 10,000-50,000 concurrent connections
  • Limited by network bandwidth and CPU

Rate Limiting

Configure per-domain rate limits:
frontend.rps:
  - value: 10000
    constraints:
      domainName: "my-domain"

Worker Service Scaling

Worker service runs internal system workflows:
  • Archival workflows (moving old workflows to long-term storage)
  • Domain replication (cross-cluster domain sync)
  • System maintenance tasks

Worker Capacity

Typically 2-5 Worker hosts are sufficient for most clusters.

Database Scaling

Cassandra Scaling

Cassandra scales horizontally by adding nodes:
persistence:
  datastores:
    default:
      cassandra:
        hosts: "host1,host2,host3,host4"
        keyspace: "cadence"
        maxConns: 20
        consistency: "local_quorum"

Cassandra Best Practices

  • Use LOCAL_QUORUM consistency for multi-DC deployments
  • Provision 3+ nodes per datacenter
  • Monitor partition key distribution (workflow_id based)
  • Size nodes for 50-70% disk utilization

MySQL/PostgreSQL Scaling

Single Database

persistence:
  datastores:
    default:
      sql:
        pluginName: "mysql"
        databaseName: "cadence"
        connectAddr: "mysql-host:3306"
        connectProtocol: "tcp"
        maxConns: 100
        maxIdleConns: 20

Multiple Database Sharding

For large-scale SQL deployments, shard across multiple databases:
persistence:
  datastores:
    default:
      sql:
        pluginName: "mysql"
        connectProtocol: "tcp"
        useMultipleDatabases: true
        nShards: 4
        multipleDatabasesConfig:
          - databaseName: "cadence0"
            connectAddr: "mysql1:3306"
            user: "cadence"
            password: "${MYSQL_PASSWORD}"
          - databaseName: "cadence1"
            connectAddr: "mysql2:3306"
            user: "cadence"
            password: "${MYSQL_PASSWORD}"
          - databaseName: "cadence2"
            connectAddr: "mysql3:3306"
            user: "cadence"
            password: "${MYSQL_PASSWORD}"
          - databaseName: "cadence3"
            connectAddr: "mysql4:3306"
            user: "cadence"
            password: "${MYSQL_PASSWORD}"

Sharding Strategy

  • Workflow execution: dbShardID = historyShardID % numDBShards
  • History events: dbShardID = hash(treeID) % numDBShards
  • Task lists: dbShardID = hash(domainID + tasklistName) % numDBShards
  • Visibility: dbShardID = hash(domainID) % numDBShards
  • Domain metadata: Always shard 0 (low write volume)
Advanced visibility (Elasticsearch/Pinot) is required when using multiple SQL databases due to visibility sharding limitations.

Performance Tuning

History Service

history:
  # Cache tuning
  historyCache:
    initialSize: 128
    maxSize: 512
    ttl: "1h"
  
  # Execution cache sizing
  enableSizeBasedHistoryExecutionCache: true
  executionCacheMaxByteSize: 536870912  # 512MB
  
  # Events cache
  eventsCache:
    initialCount: 128
    maxCount: 512
    maxSize: 67108864  # 64MB
    ttl: "1h"
  
  # Task processing
  taskProcessRPS: 10000
  taskSchedulerWorkerCount: 512

Matching Service

matching:
  # Task buffer
  longPollExpirationInterval: "60s"
  
  # Sync matching
  enableSyncMatch: true
  
  # Rate limiting
  maxTaskListPollerRate: 1000

Frontend Service

frontend:
  # Rate limiting
  rps: 20000
  
  # Persistence rate limiting
  persistenceMaxQPS: 10000
  persistenceGlobalMaxQPS: 100000
  
  # History service max QPS
  historyMaxQPS: 50000

Capacity Planning

Resource Estimation

Per Active Workflow

  • Memory: 1-10 KB (History service)
  • Database storage: 10-100 KB (depending on history length)
  • CPU: Minimal when idle

Per Workflow Execution/Second

  • History service CPU: 0.01-0.1 cores
  • Database IOPS: 10-50 reads, 5-10 writes
  • Network: 10-100 KB/s

Example Sizing

For 1M active workflows with 1000 executions/second: History Service:
  • Shards: 4096
  • Hosts: 40-80 (50-100 shards per host)
  • Memory: 64-128 GB per host
  • CPU: 8-16 cores per host
Matching Service:
  • Hosts: 10-20
  • Memory: 32-64 GB per host
  • CPU: 8-16 cores per host
Frontend Service:
  • Hosts: 5-10
  • Memory: 8-16 GB per host
  • CPU: 4-8 cores per host
Database (Cassandra):
  • Nodes: 9-15 (3 per DC, 3 DCs)
  • Storage: 500 GB - 2 TB per node
  • Memory: 64-128 GB per node
  • CPU: 16-32 cores per node

Auto-Scaling

Metrics-Based Scaling

Scale based on these metrics:
# History service - scale on CPU or active workflows
rate(cadence_workflow_started[5m]) > 1000
avg(cadence_memory_heapinuse) > 8e9

# Matching service - scale on task list backlog
cadence_matching_tasklist_backlog > 1000

# Frontend service - scale on request rate
rate(cadence_frontend_client_requests[5m]) > 10000

Kubernetes HPA Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cadence-history
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cadence-history
  minReplicas: 10
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: cadence_workflow_started
      target:
        type: AverageValue
        averageValue: "1000"

Testing Scalability

Load Testing

Use the Cadence bench tool:
cadence-bench -mode workflow \
  -domain my-domain \
  -tasklist my-tasklist \
  -concurrency 1000 \
  -rate 100 \
  -duration 1h

Monitoring During Scale Events

Monitor these during scaling:
  1. Shard transfer latency (History)
  2. Task list rebalancing (Matching)
  3. Database connection pool utilization
  4. Request error rates
  5. Task processing latency

See Also

Build docs developers (and LLMs) love