Overview
The bench suite provides:- Load Testing: Generate high volumes of workflow and activity execution
- Stress Testing: Test system limits and failure handling
- Performance Validation: Measure throughput, latency, and resource utilization
- Correctness Testing: Verify workflow execution under load
- Automated Test Suites: Run multiple test types in sequence or parallel
Setup
Prerequisites
Bench requires:- Running Cadence cluster
- Advanced Visibility (Elasticsearch) for most tests (except Basic test with basic visibility mode)
- Sufficient cluster resources for target load
Running Bench Workers
Option 1: Docker Image
Use the pre-built image:Option 2: Build from Source
Build and run the binary:config/bench/development.yaml.
Worker Configuration
Editconfig/bench/development.yaml:
- Workers poll from task lists named
cadence-bench-tl-* - Domains are auto-registered as local domains without archival
- For testing global domains or archival, register domains manually first
Test Types
Basic Load Test
Tests fundamental workflow and activity execution. Features:- Launches multiple stress workflows concurrently
- Each stress workflow executes activities sequentially or in parallel
- Validates successful completion within timeout
- Supports panic mode to test failure handling
config/bench/basic.json):
totalLaunchCount: Total stress workflows to startroutineCount: Parallel launcher activitieschainSequence: Number of sequential steps per workflowconcurrentCount: Parallel activities per steppayloadSizeBytes: Activity payload sizefailureThreshold: Acceptable failure rate (e.g., 0.01 = 1%)useBasicVisibilityValidation: Use database visibility (no Elasticsearch required)
Cancellation Test
Tests workflow cancellation at scale. Features:- Starts workflows and immediately cancels them
- Validates cancellation propagation
- Measures cancellation latency
- Ensures no workflow leaks
config/bench/cancellation.json):
Signal Test
Tests signal delivery and Signal-with-Start. Features:- Tests
SignalWorkflowExecutionAPI - Tests
SignalWithStartWorkflowExecutionAPI - Measures signal latency
- Validates signal ordering
config/bench/signal.json):
Concurrent Execution Test
Tests task throttling when workflows schedule many tasks. Purpose: Validate that a workflow scheduling many activities/child workflows doesn’t affect other domains. Features:- Schedules hundreds of activities in single decision
- Tests domain isolation
- Validates throttling configuration
- Measures scheduling latency
config/bench/concurrent_execution.json):
Timer Test
Tests timer firing at scale. Features:- Creates many timers in short period
- Tests timer service throttling
- Validates timer accuracy
- Measures timer latency
config/bench/timer.json):
Cron Test Suite
Runs multiple test suites on a schedule. Prerequisites: AddPassed search attribute (boolean type):
- Runs tests in parallel or sequential
- Multiple test suites for multi-tenant testing
- Sets
Passedsearch attribute with results - Automatic retry and reporting
config/bench/cron.json):
Metrics Collection
Bench emits metrics for monitoring:Key Metrics
-
Workflow Metrics:
workflow_start_latency: Time to start workflowworkflow_end_to_end_latency: Total workflow durationworkflow_failed: Failed workflow countworkflow_timeout: Timed-out workflow count
-
Activity Metrics:
activity_schedule_to_start: Time from schedule to startactivity_execution_latency: Activity execution timeactivity_failed: Failed activity count
-
Test Metrics:
test_passed: Test pass/fail statustest_duration: Total test durationstress_workflow_count: Number of stress workflows
Prometheus Configuration
http://localhost:9090/metrics
Best Practices
Test Planning
- Start Small: Begin with low load and increase gradually
- Isolate Tests: Use separate domains for different test types
- Monitor Resources: Watch CPU, memory, and disk I/O
- Baseline First: Establish baseline performance before optimization
Configuration Tuning
- Task Lists: Use multiple task lists for parallelism
- Timeouts: Set realistic timeouts for test duration
- Failure Threshold: Allow small failure rate for realistic testing
- Payload Size: Match production payload sizes
Production Testing
- Use Separate Cluster: Don’t test on production clusters
- Match Configuration: Mirror production settings
- Real Data: Use production-like workflow patterns
- Sustained Load: Run tests for extended periods (hours/days)
Troubleshooting
Workers Not Picking Up Tasks
Problem: Bench workers not executing workflows Solution:Tests Failing
Problem: Test reporting failures Solution:- Check failure threshold in configuration
- Review workflow failure reasons
- Verify cluster has sufficient resources
- Check for timeout configuration issues
- Review Elasticsearch if using advanced visibility
High Latency
Problem: Slow workflow execution Solution:- Increase worker count
- Add more task lists
- Scale up server resources
- Optimize database performance
- Review throttling configuration
Next Steps
- Learn about Canary Testing for health monitoring
- Configure Dynamic Config for throttling
- Set up Isolation Groups for multi-tenancy
- Monitor with Web UI