How It Works
1. Alert Detection & Correlation
When an alert arrives from a connected platform (Grafana, PagerDuty, Datadog, Netdata, Dynatrace, Splunk, Jenkins, or CloudBees), Aurora:- Creates an incident record in the
incidentstable - Analyzes the alert metadata for correlation opportunities
- Groups related alerts using multiple strategies:
- Service matching: Alerts affecting the same service
- Time-based clustering: Alerts within a 5-minute window
- Semantic similarity: ML-based alert description matching
Multi-Platform Support
Ingest alerts from Grafana, PagerDuty, Datadog, Netdata, Dynatrace, Splunk, Jenkins, and CloudBees
Smart Correlation
Automatically group related alerts to reduce noise and identify incident blast radius
2. Background RCA Workflow
Once an incident is created, Aurora launches a background Celery task that runs the RCA investigation:- Analyzes alert metadata, service context, and historical patterns
- Runs diagnostic commands using the
kubectl_tool,cloud_tool, and observability integrations - Generates streaming thoughts that are saved incrementally to
incident_thoughtstable - Produces diagnostic suggestions and fix suggestions stored in
incident_suggestions
The investigation runs autonomously in the background using a LangGraph workflow powered by the
Workflow class in server/chat/backend/agent/workflow.py. This ensures the RCA continues even if the user closes the browser.3. Real-Time Streaming UI
As the RCA progresses, the incident detail page displays real-time investigation progress: Incident List Page (/incidents):
- Shows all active and analyzed incidents
- Real-time updates via Server-Sent Events (SSE)
- Status indicators:
investigating,analyzed,resolved,merged - Displays correlated alert count and affected services
/incidents/[id]):
- Thoughts Panel: Streams AI reasoning as it investigates
- Saved incrementally every 1 second or after sentence boundaries
- Progressive display via polling (1-second interval)
- Auto-opens during active investigation
- Suggestions Tab: Shows diagnostic commands and fix suggestions
- Chat Tab: Ask follow-up questions about the incident
- Raw Alert Tab: View the original alert payload from source platform
4. Investigation Status Lifecycle
Incidents progress through these states:| Status | Description |
|---|---|
investigating | RCA is actively running (Aurora analyzing in background) |
analyzed | RCA completed, waiting for user action |
resolved | User marked incident as resolved (triggers postmortem generation) |
merged | Incident was merged into another related incident |
User Workflows
Viewing Live Investigation
- Navigate to Incidents page
- Click on an incident with status
investigating - The Thoughts Panel auto-opens on the right side
- Watch AI reasoning stream in real-time as Aurora:
- Analyzes alert context
- Runs diagnostic commands
- Identifies root cause patterns
- Generates suggestions
Live incident investigation with streaming thoughts and diagnostic suggestionsInteracting with Suggestions
Aurora generates two types of suggestions:Diagnostic Suggestions
Diagnostic Suggestions
Type:
diagnosticSafe, read-only commands to gather more information:kubectl get pods -n productiongcloud logging read --limit=50 --filter="severity=ERROR"- View service logs, metrics, or configuration
- Click “Copy” to copy command to clipboard
- Click “Execute” to run in Agent mode (if enabled)
Fix Suggestions
Fix Suggestions
Type:
fixCode changes to resolve the incident:- Configuration updates
- Bug fixes
- Dependency version changes
filePath: File to modifyoriginalContent: Current codesuggestedContent: Proposed fixuserEditedContent: User’s customized version
- Review the suggested code change
- Edit the fix if needed using the inline editor
- Click “Apply Fix” to create a GitHub branch and pull request
- The PR is created with:
- Branch:
aurora/fix-incident-{incident_id}-{timestamp} - Commit message: Incident context + fix description
- Link back to incident in Aurora
- Branch:
Follow-Up Chat
Ask questions about an incident using the Chat tab:- Loads incident context (alert details, RCA summary, investigation thoughts)
- Runs as a separate background chat session (not a new RCA)
- Supports both
ask(read-only) andagent(execution) modes - Saved in
chat_sessionstable withincident_idforeign key
Merging Related Incidents
When you discover two incidents are related:- Navigate to the incident that should be merged
- Click “Merge Alert” in the UI
- Select the target incident to merge into
- Aurora will:
- Stop the source incident’s RCA (via Celery task revocation)
- Copy the alert to target incident’s
incident_alertstable - Transfer RCA context to target investigation
- Mark source incident as
merged
Citations & Command Traceability
All diagnostic commands executed during RCA are tracked:- Stored in
incident_citationstable withcitation_key(numeric reference) - Includes: tool name, command, output, execution timestamp
- Referenced inline in thoughts using
[1],[2]notation - Click citation badge to view full command output in modal
API Reference
Get All Incidents
merged status).
Get Incident Details
- Alert metadata and raw payload
- Streaming thoughts
- Suggestions (diagnostic + fix)
- Citations
- Chat sessions
- Correlated alerts
Update Incident Status
Chat with Incident
202 Accepted with session_id to poll for response.
See AI Chat Interface for details on WebSocket streaming and session management.
Related Features
Observability Tools
Connect Grafana, Datadog, Netdata, and more to ingest alerts
Cloud Integrations
Execute diagnostic commands across GCP, AWS, and Azure