Skip to main content
Cadence Canary is a suite of automated workflows that continuously validate the health and correctness of your Cadence cluster. It tests core features and advanced capabilities to detect issues before they impact production workloads.

Overview

The canary suite provides:
  • Continuous Monitoring: Periodic execution of validation workflows
  • Feature Coverage: Tests for core and advanced Cadence features
  • Early Detection: Identifies issues before user impact
  • Operational Confidence: Validates deployments and upgrades
  • Automated Alerting: Integrates with monitoring systems

Setup

Prerequisites

  • Running Cadence cluster
  • Advanced Visibility (Elasticsearch) for some tests
  • History Archival for archival tests
  • Visibility Archival for visibility archival tests

Running Canary

Option 1: Docker Compose

Easiest setup for local development:
cd docker/
docker compose -f docker-compose-canary.yml up
This starts both canary worker and cron scheduler.

Option 2: Docker Image

For production deployments:
# Start canary (worker + cron)
docker run -d \
  --name cadence-canary \
  -v ./config:/etc/cadence/config \
  ubercadence/cadence-canary:master start

# Worker only (for manual testing)
docker run -d \
  --name cadence-canary-worker \
  -v ./config:/etc/cadence/config \
  ubercadence/cadence-canary:master start -mode worker

Option 3: Build from Source

# Build
make cadence-canary

# Start worker + cron
./cadence-canary start

# Or start worker only
./cadence-canary start -mode worker

Configuration

Edit config/canary/development.yaml:
canary:
  domains: ["cadence-canary"]
  excludes: 
    - "workflow.searchAttributes"      # Exclude if no advanced visibility
    - "workflow.batch"                  # Exclude if no advanced visibility
    - "workflow.archival.visibility"    # Exclude if no visibility archival
    - "workflow.archival.history"       # Exclude if no history archival
  cron:
    cronSchedule: "@every 30s"
    cronExecutionTimeout: "18m"
    startJobTimeout: "9m"

cadence:
  service: "cadence-frontend"
  address: "127.0.0.1:7833"           # gRPC address
  # host: "127.0.0.1:7933"            # Thrift (legacy)
  # tlsCaFile: "/path/to/ca.pem"     # TLS configuration

metrics:
  prometheus:
    timerType: "histogram"
    listenAddress: "0.0.0.0:9090"

log:
  stdout: true
  level: "info"
Important:
  • Archival tests always use canary-archival-domain domain
  • Exclude tests for features not enabled on your cluster

Canary Test Cases

Sanity Suite (Starter)

Main test suite that launches all test cases. Purpose: One-stop validation of all Cadence features Run manually:
cadence --do cadence-canary workflow start \
  --tl canary-task-queue \
  --et 1200 \
  --wt workflow.sanity \
  -i 0
Observe progress:
cadence --do cadence-canary workflow ob -w <workflow-id>
Monitor for alerting: Use the workflow_success metric with workflowType = "workflow.sanity":
sum(rate(workflow_success{workflowType="workflow.sanity"}[5m])) < 1

Cron Canary

Periodically runs the Sanity suite. Features:
  • Continuous validation
  • Fixed workflow ID: cadence.canary.cron
  • Configurable schedule
  • Automatic failure detection
Start cron:
# Automatic (with start -m all)
./cadence-canary start -m all

# Manual
./cadence-canary start -mode cronCanary
Update schedule: Terminate existing cron workflow and restart with new config.

Echo Test

Tests basic workflow functionality. What it tests:
  • Workflow execution
  • Activity execution
  • Result passing
Run manually:
cadence --do cadence-canary workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.echo \
  -i 0

Signal Test

Tests signal delivery. What it tests:
  • SignalWorkflowExecution API
  • Signal reception and handling
  • Signal buffering
Run manually:
cadence --do cadence-canary workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.signal \
  -i 0

Visibility Test

Tests basic visibility features. What it tests:
  • Workflow listing
  • Status filtering
  • Time range queries
  • Basic visibility (no Elasticsearch required)
Run manually:
cadence --do cadence-canary workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.visibility \
  -i 0

Search Attributes Test

Tests advanced visibility. What it tests:
  • Custom search attributes
  • Complex queries
  • Elasticsearch integration
Requirements: Advanced visibility enabled Run manually:
cadence --do cadence-canary workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.searchAttributes \
  -i 0

Concurrent Execution Test

Tests parallel activity execution. What it tests:
  • Concurrent activities
  • Activity results aggregation
  • Parallel execution limits
Run manually:
cadence --do cadence-canary workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.concurrent-execution \
  -i 0

Query Test

Tests workflow query feature. What it tests:
  • Query registration
  • Query execution
  • Query consistency
Run manually:
cadence --do cadence-canary workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.query \
  -i 0

Timeout Test

Tests activity timeout enforcement. What it tests:
  • Activity timeout configuration
  • Timeout enforcement
  • Timeout handling
Run manually:
cadence --do cadence-canary workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.timeout \
  -i 0

Local Activity Test

Tests local activity execution. What it tests:
  • Local activity scheduling
  • Fast execution path
  • No workflow history for local activities
Run manually:
cadence --do cadence-canary workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.localactivity \
  -i 0

Cancellation Test

Tests workflow cancellation. What it tests:
  • Cancellation requests
  • Cancellation propagation
  • Activity cancellation
  • Child workflow cancellation
Run manually:
cadence --do cadence-canary workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.cancellation \
  -i 0

Retry Test

Tests activity retry policies. What it tests:
  • Retry policy configuration
  • Automatic retries
  • Exponential backoff
  • Maximum attempts
Run manually:
cadence --do cadence-canary workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.retry \
  -i 0

Reset Test

Tests workflow reset feature. What it tests:
  • Workflow reset to decision
  • Reset validation
  • History replay after reset
Run manually:
cadence --do cadence-canary workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.reset \
  -i 0

History Archival Test

Tests history archival. What it tests:
  • History archival to storage
  • Archived history retrieval
  • Archival URI validation
Requirements: History archival enabled Domain: Always uses canary-archival-domain Run manually:
cadence --do canary-archival-domain workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.archival.history \
  -i 0

Visibility Archival Test

Tests visibility archival. What it tests:
  • Visibility record archival
  • Archived visibility queries
  • Archival URI validation
Requirements: Visibility archival enabled Domain: Always uses canary-archival-domain Run manually:
cadence --do canary-archival-domain workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.archival.visibility \
  -i 0

Batch Test

Tests batch operations. What it tests:
  • Batch workflow termination
  • Batch workflow signaling
  • Query-based batch operations
Requirements: Advanced visibility enabled Run manually:
cadence --do cadence-canary workflow start \
  --tl canary-task-queue \
  --et 10 \
  --wt workflow.batch \
  -i 0

Monitoring and Alerting

Key Metrics

Monitor these metrics for canary health: Workflow Success Rate:
sum(rate(workflow_success{workflowType="workflow.sanity"}[5m]))
Test Execution Time:
histogram_quantile(0.95, 
  rate(workflow_endtoend_latency_bucket{workflowType="workflow.sanity"}[5m])
)
Failed Tests:
sum(increase(workflow_failed{workflowType="workflow.sanity"}[5m])) > 0

Alerting Rules

Prometheus Alert Example:
groups:
  - name: cadence_canary
    rules:
      - alert: CanaryTestFailed
        expr: sum(rate(workflow_success{workflowType="workflow.sanity"}[5m])) < 1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Cadence canary test failing"
          description: "Canary test has been failing for 5 minutes"
      
      - alert: CanaryHighLatency
        expr: |
          histogram_quantile(0.95, 
            rate(workflow_endtoend_latency_bucket{workflowType="workflow.sanity"}[5m])
          ) > 300
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Canary test latency high"
          description: "P95 latency above 5 minutes"

Production Monitoring

Recommended Setup:
  1. Deploy canary in production cluster
  2. Configure cron schedule (e.g., every 5 minutes)
  3. Monitor workflow_success metric
  4. Alert on failures or high latency
  5. Exclude tests for disabled features
  6. Use separate alerting for archival tests

Best Practices

Configuration

  • Exclude Unavailable Features: Don’t test features not enabled
  • Appropriate Frequency: Balance coverage vs. load (5-30 minutes typical)
  • Realistic Timeouts: Set timeouts for expected execution time
  • Resource Allocation: Ensure canary doesn’t impact production

Monitoring

  • Alert on Failures: Set up immediate alerts for test failures
  • Track Latency Trends: Monitor for performance degradation
  • Dashboard: Create dedicated canary dashboard
  • Test-Specific Metrics: Monitor individual test types

Operations

  • Post-Deployment: Run canary immediately after deployments
  • Pre-Upgrade: Verify canary passes before upgrades
  • Incident Response: Check canary status during incidents
  • Capacity Planning: Use canary metrics for baseline performance

Troubleshooting

Canary Not Running

Problem: No canary executions Solution:
# Check worker status
cadence --do cadence-canary tasklist describe --tl canary-task-queue

# Verify cron workflow
cadence --do cadence-canary workflow list --query "WorkflowId='cadence.canary.cron'"

# Check worker logs
docker logs cadence-canary

Test Failures

Problem: Specific test consistently failing Solution:
  1. Run test manually for debugging
  2. Check if feature is properly configured
  3. Verify required dependencies (e.g., Elasticsearch)
  4. Review server logs for errors
  5. Check domain configuration

Archival Tests Failing

Problem: Archival tests fail but others pass Solution:
  • Verify archival is enabled in server config
  • Check canary-archival-domain exists and has archival enabled
  • Verify archival storage (S3/GCS/filestore) is accessible
  • Review archival worker logs
  • Test archival manually with CLI

Next Steps

Build docs developers (and LLMs) love