What to Monitor
Key metrics and aspects to monitor in your Infinitic deployment:Workflow Metrics
- Completion rate - Percentage of workflows completing successfully
- Failure rate - Percentage of workflows failing
- Duration - Time from start to completion
- Active workflows - Number of currently running workflows
- Queued workflows - Workflows waiting to start
Task Metrics
- Execution time - How long tasks take to complete
- Retry rate - Frequency of task retries
- Timeout rate - Tasks exceeding timeout limits
- Failure rate - Task failure percentage
- Queue depth - Number of pending tasks
System Metrics
- Worker health - Status of worker instances
- Message throughput - Messages processed per second
- Resource utilization - CPU, memory, network usage
- Storage usage - Workflow state storage consumption
Using CloudEvents for Monitoring
Infinitic emits CloudEvents for all lifecycle events, making it the primary mechanism for monitoring:Checking Workflow Status
Query workflow status programmatically:Integration with Monitoring Tools
Prometheus Integration
Grafana Dashboard
Create dashboards with key metrics:DataDog Integration
Health Checks
Implement health check endpoints for your workers and services:Alerting
Set up alerts based on metrics:Logging Best Practices
Structured logging for better observability:Distributed Tracing
Integrate with distributed tracing systems:Best Practices
Monitor end-to-end workflows
Monitor end-to-end workflows
Track workflows from start to finish, including all task executions and retries.
Set up alerting for critical failures
Set up alerting for critical failures
Alert on high failure rates, timeout rates, or specific critical workflow failures.
Use tags for filtering
Use tags for filtering
Track business metrics
Track business metrics
Monitor business-level metrics (orders processed, payments completed) alongside technical metrics.
Implement health checks
Implement health checks
Regular health checks ensure workers and services are running correctly.
Store historical data
Store historical data
Retain metrics for trend analysis and capacity planning.
Use structured logging
Use structured logging
Structured logs are easier to search, filter, and analyze.
Example: Complete Monitoring Setup
Related Topics
- CloudEvents - Event-driven monitoring
- Error Handling - Monitor and handle failures
- Tags - Use tags for metric filtering
- Metadata - Include context in monitoring