Running Your First Investigation

Aurora automatically investigates incidents triggered by your observability integrations. This guide walks you through your first investigation.

Prerequisites

Before running your first investigation:

Connect at least one cloud provider (AWS, GCP, or Azure)
Set up an observability integration (Datadog, Grafana, PagerDuty, or Netdata)
Configure your LLM provider (OpenAI, Anthropic, or OpenRouter)

How Incidents Are Created

Aurora creates incidents automatically when:

Alert webhooks trigger from your observability tools
Manual investigations are started from the UI
Correlated alerts are grouped into existing incidents

Each incident gets:

A unique incident ID
Alert metadata (title, severity, service, environment)
A dedicated chat session for investigation
Real-time thought streaming

Starting an Investigation

Trigger an alert

Send a test alert from your observability platform. Aurora will automatically:

Receive the webhook
Create an incident in the database
Start a background RCA (Root Cause Analysis) task

View the incident

Navigate to the Incidents page to see your new incident:

GET /api/incidents

The incident will show:

Status: investigating, analyzed, or resolved
Aurora Status: idle, running, complete, or error
Alert Details: title, service, severity, source type

Watch the investigation

Click into the incident to see Aurora’s investigation in real-time:

Thoughts Tab: Streaming analysis from the LangGraph agent
Suggestions Tab: Diagnostic commands and fix recommendations
Citations: Evidence from executed commands (logs, metrics, traces)

The investigation runs in the background via Celery workers.

Review findings

Once auroraStatus changes to complete, review:

Summary: High-level RCA summary
Correlated Alerts: Related alerts grouped by service/time
Suggestions: Commands to run or code fixes to apply
Root Cause: Identified issue and recommended remediation

Understanding Investigation Output

Thoughts

Thoughts are real-time updates stored in incident_thoughts table:

# server/routes/incidents_routes.py:590-611
cursor.execute(
    """SELECT id, incident_id, timestamp, content, thought_type, created_at
       FROM incident_thoughts
       WHERE incident_id = %s
       ORDER BY timestamp ASC""",
    (incident_id,)
)

Thought types:

analysis - Investigation reasoning
observation - Findings from tools
hypothesis - Potential root causes

Suggestions

Suggestions are actionable recommendations stored in incident_suggestions:

# Types: diagnostic | fix
# Risk levels: safe | medium | destructive

Diagnostic suggestions provide commands to gather more data:

kubectl get pods -n production
aws cloudwatch get-metric-statistics --metric-name CPUUtilization

Fix suggestions propose code changes with diffs:

filePath - File to modify
suggestedContent - Proposed changes
repository - Target repo for PR

Citations

Citations are evidence from tool executions:

# server/routes/incidents_routes.py:614-646
# Each citation includes:
# - citation_key: Reference number [1], [2], etc.
# - tool_name: Tool that generated the evidence
# - command: Command executed
# - output: Results
# - executed_at: Timestamp

Chatting with Aurora

You can ask follow-up questions during or after an investigation:

Open the chat tab

Switch to the Chat tab in the incident detail view.

Ask a question

POST /api/incidents/{incident_id}/chat
{
  "question": "What logs show errors around the incident start time?",
  "mode": "ask"  # or "agent" for execution capability
}

Review the response

Aurora provides context-aware answers based on:

The incident’s alert metadata
Investigation thoughts and findings
Citations from previous tool executions

Chat sessions are stored in chat_sessions and linked via incident_id.

Resolving an Incident

When the issue is fixed:

PATCH /api/incidents/{incident_id}
{
  "status": "resolved"
}

This triggers automatic postmortem generation:

# server/routes/incidents_routes.py:876-889
if data.get("status") == "resolved" and previous_status != "resolved":
    generate_postmortem.delay(incident_id, user_id)

Troubleshooting

Investigation not starting

Check Celery worker logs:

docker logs -f aurora-celery_worker-1

Verify:

Redis is running (Celery broker)
LLM API keys are configured
Cloud provider credentials are valid

No thoughts appearing

Ensure the WebSocket connection is active:

ws://localhost:5006

Check auroraStatus - if error, view worker logs for exceptions.

Suggestions failing to execute

Diagnostic commands require:

Valid cloud provider credentials
Proper IAM/RBAC permissions
Network access to resources (kubectl, AWS CLI, etc.)

Check agent tool logs in Celery worker output.

Incident not correlating alerts

Alert correlation uses:

Service name matching
Time window (5-15 minutes)
Severity thresholds

View incident_alerts table for correlation details:

SELECT correlation_strategy, correlation_score, correlation_details
FROM incident_alerts WHERE incident_id = ?;

Next Steps

Connect More Sources

Add AWS, GCP, or Azure for deeper investigation capabilities

Set Up Monitoring

Configure Datadog, Grafana, or other observability integrations

Custom Connectors

Build integrations for proprietary systems

Backup & Restore

Set up automated backups for incident data

Get Started

Core Features

Architecture

Deployment

Configuration

Integrations

Cloud Providers

Observability

Development

Guides

Reference

Help

Running Your First Investigation

Prerequisites

How Incidents Are Created

Starting an Investigation

Understanding Investigation Output

Thoughts

Suggestions

Citations

Chatting with Aurora

Resolving an Incident

Troubleshooting

Next Steps

Connect More Sources

Set Up Monitoring

Custom Connectors

Backup & Restore

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

Deployment

Configuration

Integrations

Cloud Providers

Observability

Development

Guides

Reference

Help

​Prerequisites

​How Incidents Are Created

​Starting an Investigation

​Understanding Investigation Output

​Thoughts

​Suggestions

​Citations

​Chatting with Aurora

​Resolving an Incident

​Troubleshooting

​Next Steps

Connect More Sources

Set Up Monitoring

Custom Connectors

Backup & Restore

Build docs developers (and LLMs) love

Prerequisites

How Incidents Are Created

Starting an Investigation

Understanding Investigation Output

Thoughts

Suggestions

Citations

Chatting with Aurora

Resolving an Incident

Troubleshooting

Next Steps