Overview
The canary suite provides:- Continuous Monitoring: Periodic execution of validation workflows
- Feature Coverage: Tests for core and advanced Cadence features
- Early Detection: Identifies issues before user impact
- Operational Confidence: Validates deployments and upgrades
- Automated Alerting: Integrates with monitoring systems
Setup
Prerequisites
- Running Cadence cluster
- Advanced Visibility (Elasticsearch) for some tests
- History Archival for archival tests
- Visibility Archival for visibility archival tests
Running Canary
Option 1: Docker Compose
Easiest setup for local development:Option 2: Docker Image
For production deployments:Option 3: Build from Source
Configuration
Editconfig/canary/development.yaml:
- Archival tests always use
canary-archival-domaindomain - Exclude tests for features not enabled on your cluster
Canary Test Cases
Sanity Suite (Starter)
Main test suite that launches all test cases. Purpose: One-stop validation of all Cadence features Run manually:workflow_success metric with workflowType = "workflow.sanity":
Cron Canary
Periodically runs the Sanity suite. Features:- Continuous validation
- Fixed workflow ID:
cadence.canary.cron - Configurable schedule
- Automatic failure detection
Echo Test
Tests basic workflow functionality. What it tests:- Workflow execution
- Activity execution
- Result passing
Signal Test
Tests signal delivery. What it tests:SignalWorkflowExecutionAPI- Signal reception and handling
- Signal buffering
Visibility Test
Tests basic visibility features. What it tests:- Workflow listing
- Status filtering
- Time range queries
- Basic visibility (no Elasticsearch required)
Search Attributes Test
Tests advanced visibility. What it tests:- Custom search attributes
- Complex queries
- Elasticsearch integration
Concurrent Execution Test
Tests parallel activity execution. What it tests:- Concurrent activities
- Activity results aggregation
- Parallel execution limits
Query Test
Tests workflow query feature. What it tests:- Query registration
- Query execution
- Query consistency
Timeout Test
Tests activity timeout enforcement. What it tests:- Activity timeout configuration
- Timeout enforcement
- Timeout handling
Local Activity Test
Tests local activity execution. What it tests:- Local activity scheduling
- Fast execution path
- No workflow history for local activities
Cancellation Test
Tests workflow cancellation. What it tests:- Cancellation requests
- Cancellation propagation
- Activity cancellation
- Child workflow cancellation
Retry Test
Tests activity retry policies. What it tests:- Retry policy configuration
- Automatic retries
- Exponential backoff
- Maximum attempts
Reset Test
Tests workflow reset feature. What it tests:- Workflow reset to decision
- Reset validation
- History replay after reset
History Archival Test
Tests history archival. What it tests:- History archival to storage
- Archived history retrieval
- Archival URI validation
canary-archival-domain
Run manually:
Visibility Archival Test
Tests visibility archival. What it tests:- Visibility record archival
- Archived visibility queries
- Archival URI validation
canary-archival-domain
Run manually:
Batch Test
Tests batch operations. What it tests:- Batch workflow termination
- Batch workflow signaling
- Query-based batch operations
Monitoring and Alerting
Key Metrics
Monitor these metrics for canary health: Workflow Success Rate:Alerting Rules
Prometheus Alert Example:Production Monitoring
Recommended Setup:- Deploy canary in production cluster
- Configure cron schedule (e.g., every 5 minutes)
- Monitor
workflow_successmetric - Alert on failures or high latency
- Exclude tests for disabled features
- Use separate alerting for archival tests
Best Practices
Configuration
- Exclude Unavailable Features: Don’t test features not enabled
- Appropriate Frequency: Balance coverage vs. load (5-30 minutes typical)
- Realistic Timeouts: Set timeouts for expected execution time
- Resource Allocation: Ensure canary doesn’t impact production
Monitoring
- Alert on Failures: Set up immediate alerts for test failures
- Track Latency Trends: Monitor for performance degradation
- Dashboard: Create dedicated canary dashboard
- Test-Specific Metrics: Monitor individual test types
Operations
- Post-Deployment: Run canary immediately after deployments
- Pre-Upgrade: Verify canary passes before upgrades
- Incident Response: Check canary status during incidents
- Capacity Planning: Use canary metrics for baseline performance
Troubleshooting
Canary Not Running
Problem: No canary executions Solution:Test Failures
Problem: Specific test consistently failing Solution:- Run test manually for debugging
- Check if feature is properly configured
- Verify required dependencies (e.g., Elasticsearch)
- Review server logs for errors
- Check domain configuration
Archival Tests Failing
Problem: Archival tests fail but others pass Solution:- Verify archival is enabled in server config
- Check
canary-archival-domainexists and has archival enabled - Verify archival storage (S3/GCS/filestore) is accessible
- Review archival worker logs
- Test archival manually with CLI
Next Steps
- Learn about Benchmarking for load testing
- Configure Archival for tested features
- Set up Dynamic Config for tuning
- Monitor with Web UI