Skip to main content
Aurora automatically investigates incidents triggered by your observability integrations. This guide walks you through your first investigation.

Prerequisites

Before running your first investigation:
  • Connect at least one cloud provider (AWS, GCP, or Azure)
  • Set up an observability integration (Datadog, Grafana, PagerDuty, or Netdata)
  • Configure your LLM provider (OpenAI, Anthropic, or OpenRouter)

How Incidents Are Created

Aurora creates incidents automatically when:
  1. Alert webhooks trigger from your observability tools
  2. Manual investigations are started from the UI
  3. Correlated alerts are grouped into existing incidents
Each incident gets:
  • A unique incident ID
  • Alert metadata (title, severity, service, environment)
  • A dedicated chat session for investigation
  • Real-time thought streaming

Starting an Investigation

1

Trigger an alert

Send a test alert from your observability platform. Aurora will automatically:
  • Receive the webhook
  • Create an incident in the database
  • Start a background RCA (Root Cause Analysis) task
2

View the incident

Navigate to the Incidents page to see your new incident:
GET /api/incidents
The incident will show:
  • Status: investigating, analyzed, or resolved
  • Aurora Status: idle, running, complete, or error
  • Alert Details: title, service, severity, source type
3

Watch the investigation

Click into the incident to see Aurora’s investigation in real-time:
  • Thoughts Tab: Streaming analysis from the LangGraph agent
  • Suggestions Tab: Diagnostic commands and fix recommendations
  • Citations: Evidence from executed commands (logs, metrics, traces)
The investigation runs in the background via Celery workers.
4

Review findings

Once auroraStatus changes to complete, review:
  • Summary: High-level RCA summary
  • Correlated Alerts: Related alerts grouped by service/time
  • Suggestions: Commands to run or code fixes to apply
  • Root Cause: Identified issue and recommended remediation

Understanding Investigation Output

Thoughts

Thoughts are real-time updates stored in incident_thoughts table:
# server/routes/incidents_routes.py:590-611
cursor.execute(
    """SELECT id, incident_id, timestamp, content, thought_type, created_at
       FROM incident_thoughts
       WHERE incident_id = %s
       ORDER BY timestamp ASC""",
    (incident_id,)
)
Thought types:
  • analysis - Investigation reasoning
  • observation - Findings from tools
  • hypothesis - Potential root causes

Suggestions

Suggestions are actionable recommendations stored in incident_suggestions:
# Types: diagnostic | fix
# Risk levels: safe | medium | destructive
Diagnostic suggestions provide commands to gather more data:
kubectl get pods -n production
aws cloudwatch get-metric-statistics --metric-name CPUUtilization
Fix suggestions propose code changes with diffs:
  • filePath - File to modify
  • suggestedContent - Proposed changes
  • repository - Target repo for PR

Citations

Citations are evidence from tool executions:
# server/routes/incidents_routes.py:614-646
# Each citation includes:
# - citation_key: Reference number [1], [2], etc.
# - tool_name: Tool that generated the evidence
# - command: Command executed
# - output: Results
# - executed_at: Timestamp

Chatting with Aurora

You can ask follow-up questions during or after an investigation:
1

Open the chat tab

Switch to the Chat tab in the incident detail view.
2

Ask a question

POST /api/incidents/{incident_id}/chat
{
  "question": "What logs show errors around the incident start time?",
  "mode": "ask"  # or "agent" for execution capability
}
3

Review the response

Aurora provides context-aware answers based on:
  • The incident’s alert metadata
  • Investigation thoughts and findings
  • Citations from previous tool executions
Chat sessions are stored in chat_sessions and linked via incident_id.

Resolving an Incident

When the issue is fixed:
PATCH /api/incidents/{incident_id}
{
  "status": "resolved"
}
This triggers automatic postmortem generation:
# server/routes/incidents_routes.py:876-889
if data.get("status") == "resolved" and previous_status != "resolved":
    generate_postmortem.delay(incident_id, user_id)

Troubleshooting

Check Celery worker logs:
docker logs -f aurora-celery_worker-1
Verify:
  • Redis is running (Celery broker)
  • LLM API keys are configured
  • Cloud provider credentials are valid
Ensure the WebSocket connection is active:
ws://localhost:5006
Check auroraStatus - if error, view worker logs for exceptions.
Diagnostic commands require:
  • Valid cloud provider credentials
  • Proper IAM/RBAC permissions
  • Network access to resources (kubectl, AWS CLI, etc.)
Check agent tool logs in Celery worker output.
Alert correlation uses:
  • Service name matching
  • Time window (5-15 minutes)
  • Severity thresholds
View incident_alerts table for correlation details:
SELECT correlation_strategy, correlation_score, correlation_details
FROM incident_alerts WHERE incident_id = ?;

Next Steps

Connect More Sources

Add AWS, GCP, or Azure for deeper investigation capabilities

Set Up Monitoring

Configure Datadog, Grafana, or other observability integrations

Custom Connectors

Build integrations for proprietary systems

Backup & Restore

Set up automated backups for incident data

Build docs developers (and LLMs) love