Overview
Async Workflow Queues provide:- Rate Limiting: Control workflow start rate per domain
- Buffering: Queue workflow starts during high load
- Decoupling: Separate request acceptance from execution
- Backpressure: Prevent overload from workflow start storms
- Multi-Tenancy: Per-domain queue configuration
Architecture
- Client submits workflow start request
- Frontend validates and publishes to Kafka
- Request acknowledged immediately
- Async worker consumes from Kafka
- Worker starts workflow via normal path
- Rate limiting and retries handled by worker
Setup
Prerequisites
- Kafka cluster (version 2.0+)
- Cadence server with async workflow feature enabled
- Worker service for consuming queue
Kafka Configuration
Create Kafka topics for async workflow queue:Server Configuration
Enable async workflow queues inconfig.yaml:
Domain Configuration
Enable async workflow queue per domain:enabled: Enable/disable async queue for domainpredefinedQueueName: Kafka topic namequeueType: Queue backend (currently only “kafka”)queueConfig: Queue-specific configuration
Worker Service Setup
Async workers are part of the Cadence worker service:Usage
Starting Async Workflows
Go SDK:Checking Workflow Status
Workflow may not start immediately:Rate Limiting
Consumer Rate Limiting
Control workflow start rate:Backpressure Handling
Kafka provides natural backpressure:- Queue Full: Kafka rejects new messages if retention exceeded
- Slow Consumption: Messages accumulate in Kafka
- Consumer Lag: Monitor
consumer_lagmetric
Monitoring
Key Metrics
Queue Depth:CLI Monitoring
Alerting
High Consumer Lag:Best Practices
Queue Configuration
- Partition Count: Start with
num_workers * concurrency / 1000partitions - Retention: Set based on acceptable delay (24h typical)
- Replication: Use RF=3 for durability
- Max Message Size: Match Cadence payload limits
Consumer Tuning
- Concurrency: Balance throughput vs. resource usage
- Rate Limiting: Prevent overload of downstream services
- Batch Size: Tune Kafka consumer
fetch.min.bytes - Commit Interval: Balance consistency vs. throughput
Operational
- Monitor Lag: Alert if lag exceeds threshold
- DLQ: Configure dead-letter queue for failed starts
- Capacity Planning: Size workers for peak load + headroom
- Testing: Test queue behavior under load
Use Cases
Rate-Limited API
Protect backend from workflow start storms:Batch Processing
Queue large batches of workflow starts:Decoupled Services
Decouple request service from workflow execution:Troubleshooting
High Consumer Lag
Problem: Kafka consumer lag increasing Solution:- Scale up worker instances
- Increase consumer concurrency
- Check for slow workflow starts
- Review rate limiting settings
- Verify workers are healthy
Workflows Not Starting
Problem: Workflows queued but not executing Solution:Failed Workflow Starts
Problem: High failure rate for async workflow starts Solution:- Check DLQ for failed messages
- Review error logs in worker
- Verify workflow registration
- Check task list worker availability
- Validate workflow input payloads
Advanced Topics
Dead Letter Queue
Handle failed workflow starts:Custom Queue Implementation
Implement custom queue backend:Next Steps
- Configure Isolation Groups for zone awareness
- Set up Dynamic Config for queue tuning
- Monitor with Web UI
- Test with Bench for load validation