Bench Testing

Cadence Bench is a comprehensive load testing suite that validates server performance, stability, and correctness under various workload patterns. It includes multiple test types for different scenarios and use cases.

Overview

The bench suite provides:

Load Testing: Generate high volumes of workflow and activity execution
Stress Testing: Test system limits and failure handling
Performance Validation: Measure throughput, latency, and resource utilization
Correctness Testing: Verify workflow execution under load
Automated Test Suites: Run multiple test types in sequence or parallel

Setup

Prerequisites

Bench requires:

Running Cadence cluster
Advanced Visibility (Elasticsearch) for most tests (except Basic test with basic visibility mode)
Sufficient cluster resources for target load

Running Bench Workers

Option 1: Docker Image

Use the pre-built image:

# Against local server
cd docker/
docker compose -f docker-compose-bench.yml up

# Against production cluster
docker run -d \
  --name cadence-bench \
  -v ./config:/etc/cadence/config \
  ubercadence/cadence-bench:master start

Option 2: Build from Source

Build and run the binary:

# Build
make cadence-bench

# Start workers
./cadence-bench start

By default, loads configuration from config/bench/development.yaml.

Worker Configuration

Edit config/bench/development.yaml:

bench:
  name: "cadence-bench"
  domains: ["cadence-bench", "cadence-bench-sync", "cadence-bench-batch"]
  numTaskLists: 3  # Creates cadence-bench-tl-0, tl-1, tl-2

cadence:
  service: "cadence-frontend"
  host: "127.0.0.1:7833"

metrics:
  prometheus:
    timerType: "histogram"
    listenAddress: "0.0.0.0:9090"

log:
  stdout: true
  level: "info"

Important Notes:

Workers poll from task lists named cadence-bench-tl-*
Domains are auto-registered as local domains without archival
For testing global domains or archival, register domains manually first

Test Types

Basic Load Test

Tests fundamental workflow and activity execution. Features:

Launches multiple stress workflows concurrently
Each stress workflow executes activities sequentially or in parallel
Validates successful completion within timeout
Supports panic mode to test failure handling

Configuration (config/bench/basic.json):

{
  "totalLaunchCount": 1000,
  "routineCount": 10,
  "chainSequence": 5,
  "concurrentCount": 10,
  "payloadSizeBytes": 1024,
  "executionStartToCloseTimeoutInSeconds": 300,
  "failureThreshold": 0.01,
  "useBasicVisibilityValidation": false
}

Parameters:

totalLaunchCount: Total stress workflows to start
routineCount: Parallel launcher activities
chainSequence: Number of sequential steps per workflow
concurrentCount: Parallel activities per step
payloadSizeBytes: Activity payload size
failureThreshold: Acceptable failure rate (e.g., 0.01 = 1%)
useBasicVisibilityValidation: Use database visibility (no Elasticsearch required)

Run the test:

cadence --do cadence-bench workflow start \
  --tl cadence-bench-tl-0 \
  --wt basic-load-test-workflow \
  --dt 30 \
  --et 3600 \
  --if config/bench/basic.json

View results:

cadence --do cadence-bench workflow observe -w <workflow-id>

Output:

Result:
  Run Time: 26 seconds
  Status: COMPLETED
  Output: "TEST PASSED. Details report: timeoutCount: 0, failedCount: 0, openCount:0, launchCount: 1000, maxThreshold:10"

Cancellation Test

Tests workflow cancellation at scale. Features:

Starts workflows and immediately cancels them
Validates cancellation propagation
Measures cancellation latency
Ensures no workflow leaks

Configuration (config/bench/cancellation.json):

{
  "totalLaunchCount": 1000,
  "routineCount": 20,
  "contextTimeoutInSeconds": 5
}

Run the test:

cadence --do cadence-bench workflow start \
  --tl cadence-bench-tl-0 \
  --wt cancellation-load-test-workflow \
  --dt 30 \
  --et 3600 \
  --if config/bench/cancellation.json

Signal Test

Tests signal delivery and Signal-with-Start. Features:

Tests SignalWorkflowExecution API
Tests SignalWithStartWorkflowExecution API
Measures signal latency
Validates signal ordering

Configuration (config/bench/signal.json):

{
  "totalLaunchCount": 1000,
  "routineCount": 20,
  "totalSignalCount": 5
}

Run the test:

cadence --do cadence-bench workflow start \
  --tl cadence-bench-tl-0 \
  --wt signal-load-test-workflow \
  --dt 30 \
  --et 3600 \
  --if config/bench/signal.json

Concurrent Execution Test

Tests task throttling when workflows schedule many tasks. Purpose: Validate that a workflow scheduling many activities/child workflows doesn’t affect other domains. Features:

Schedules hundreds of activities in single decision
Tests domain isolation
Validates throttling configuration
Measures scheduling latency

Configuration (config/bench/concurrent_execution.json):

{
  "totalLaunchCount": 100,
  "concurrentActivityCount": 500,
  "maxAllowedDelayInSeconds": 300
}

Run the test:

cadence --do cadence-bench workflow start \
  --tl cadence-bench-tl-0 \
  --wt concurrent-execution-test-workflow \
  --dt 30 \
  --et 3600 \
  --if config/bench/concurrent_execution.json

Use Case: Run alongside other tests in different domains to verify isolation.

Timer Test

Tests timer firing at scale. Features:

Creates many timers in short period
Tests timer service throttling
Validates timer accuracy
Measures timer latency

Configuration (config/bench/timer.json):

{
  "totalLaunchCount": 100,
  "timerCount": 1000,
  "maxAllowedLatencyInSeconds": 60
}

Run the test:

cadence --do cadence-bench workflow start \
  --tl cadence-bench-tl-0 \
  --wt timer-load-test-workflow \
  --dt 30 \
  --et 3600 \
  --if config/bench/timer.json

Cron Test Suite

Runs multiple test suites on a schedule. Prerequisites: Add Passed search attribute (boolean type):

# Verify search attributes
cadence cluster get-search-attr

# Should include: Passed (BOOL)

Features:

Runs tests in parallel or sequential
Multiple test suites for multi-tenant testing
Sets Passed search attribute with results
Automatic retry and reporting

Configuration (config/bench/cron.json):

{
  "cronSchedule": "@every 30m",
  "testSuites": [
    {
      "name": "suite-1",
      "domain": "cadence-bench",
      "tests": ["basic", "signal", "cancellation"]
    },
    {
      "name": "suite-2",
      "domain": "cadence-bench-sync",
      "tests": ["concurrent", "timer"]
    }
  ]
}

Run the test:

cadence --do cadence-bench workflow start \
  --tl cadence-bench-tl-0 \
  --wt cron-test-workflow \
  --dt 30 \
  --et 7200 \
  --if config/bench/cron.json

Query results:

cadence --do cadence-bench workflow query \
  --wid <cron-workflow-id> \
  --qt test-results

List cron runs:

# Show Passed search attribute
cadence --do cadence-bench workflow list --psa

Metrics Collection

Bench emits metrics for monitoring:

Key Metrics

Workflow Metrics:
- workflow_start_latency: Time to start workflow
- workflow_end_to_end_latency: Total workflow duration
- workflow_failed: Failed workflow count
- workflow_timeout: Timed-out workflow count
Activity Metrics:
- activity_schedule_to_start: Time from schedule to start
- activity_execution_latency: Activity execution time
- activity_failed: Failed activity count
Test Metrics:
- test_passed: Test pass/fail status
- test_duration: Total test duration
- stress_workflow_count: Number of stress workflows

Prometheus Configuration

metrics:
  prometheus:
    timerType: "histogram"
    listenAddress: "0.0.0.0:9090"
    defaultHistogramBuckets: [1, 10, 50, 100, 500, 1000, 5000]

Access metrics at: http://localhost:9090/metrics

Best Practices

Test Planning

Start Small: Begin with low load and increase gradually
Isolate Tests: Use separate domains for different test types
Monitor Resources: Watch CPU, memory, and disk I/O
Baseline First: Establish baseline performance before optimization

Configuration Tuning

Task Lists: Use multiple task lists for parallelism
Timeouts: Set realistic timeouts for test duration
Failure Threshold: Allow small failure rate for realistic testing
Payload Size: Match production payload sizes

Production Testing

Use Separate Cluster: Don’t test on production clusters
Match Configuration: Mirror production settings
Real Data: Use production-like workflow patterns
Sustained Load: Run tests for extended periods (hours/days)

Troubleshooting

Workers Not Picking Up Tasks

Problem: Bench workers not executing workflows Solution:

# Verify workers are running
cadence --do cadence-bench tasklist describe --tl cadence-bench-tl-0

# Check worker logs
docker logs cadence-bench

# Verify domain registration
cadence --do cadence-bench domain describe

Tests Failing

Problem: Test reporting failures Solution:

Check failure threshold in configuration
Review workflow failure reasons
Verify cluster has sufficient resources
Check for timeout configuration issues
Review Elasticsearch if using advanced visibility

High Latency

Problem: Slow workflow execution Solution:

Increase worker count
Add more task lists
Scale up server resources
Optimize database performance
Review throttling configuration

Next Steps

Learn about Canary Testing for health monitoring
Configure Dynamic Config for throttling
Set up Isolation Groups for multi-tenancy
Monitor with Web UI

Development Tools

Advanced

Overview

Setup

Prerequisites

Running Bench Workers

Option 1: Docker Image

Option 2: Build from Source

Worker Configuration

Test Types

Basic Load Test

Cancellation Test

Signal Test

Concurrent Execution Test

Timer Test

Cron Test Suite

Metrics Collection

Key Metrics

Prometheus Configuration

Best Practices

Test Planning

Configuration Tuning

Production Testing

Troubleshooting

Workers Not Picking Up Tasks

Tests Failing

High Latency

Next Steps

Build docs developers (and LLMs) love

Development Tools

Advanced

​Overview

​Setup

​Prerequisites

​Running Bench Workers

​Option 1: Docker Image

​Option 2: Build from Source

​Worker Configuration

​Test Types

​Basic Load Test

​Cancellation Test

​Signal Test

​Concurrent Execution Test

​Timer Test

​Cron Test Suite

​Metrics Collection

​Key Metrics

​Prometheus Configuration

​Best Practices

​Test Planning

​Configuration Tuning

​Production Testing

​Troubleshooting

​Workers Not Picking Up Tasks

​Tests Failing

​High Latency

​Next Steps

Build docs developers (and LLMs) love

Overview

Setup

Prerequisites

Running Bench Workers

Option 1: Docker Image

Option 2: Build from Source

Worker Configuration

Test Types

Basic Load Test

Cancellation Test

Signal Test

Concurrent Execution Test

Timer Test

Cron Test Suite

Metrics Collection

Key Metrics

Prometheus Configuration

Best Practices

Test Planning

Configuration Tuning

Production Testing

Troubleshooting

Workers Not Picking Up Tasks

Tests Failing

High Latency

Next Steps