Skip to main content
Aurora integrates with leading observability and CI/CD platforms to automatically detect and investigate incidents in real-time.

Supported Platforms

Grafana

Alert webhooks from Grafana Cloud or self-hosted instances

PagerDuty

Incident webhooks with runbook integration

Datadog

Event stream webhooks for monitors and alerts

Netdata

Real-time health monitoring alerts

Dynatrace

Problem webhooks with Davis AI insights

Splunk

Search-based alerts and log anomalies

Jenkins

Build failure detection with RCA

CloudBees

CloudBees CI build/deployment failures

Integration Architecture

Webhook Ingestion

Each platform sends alerts to Aurora via webhook:
Platform → POST /api/webhooks/{platform} → Aurora Backend

                                    Create Incident

                                    Launch Background RCA

                                    Stream Thoughts to UI

Common Webhook Flow

1

Alert triggered in platform

Grafana/PagerDuty/Datadog monitor detects an issue
2

Platform sends webhook

POST request to Aurora’s webhook endpoint with alert payload
3

Aurora validates & stores

Webhook handler validates payload and stores in platform-specific table (e.g., grafana_alerts)
4

Incident creation

Alert is converted to an incident in the incidents table
5

Correlation check

Aurora checks if this alert correlates with existing incidents
6

RCA initiated

If new incident, Aurora launches background investigation via Celery task

Platform-Specific Features

Setup

  1. Create a Contact Point in Grafana:
    • Type: Webhook
    • URL: https://your-aurora.com/api/webhooks/grafana
    • Method: POST
  2. Attach contact point to alert rules

Webhook Payload

{
  "receiver": "aurora",
  "status": "firing",
  "alerts": [{
    "status": "firing",
    "labels": {
      "alertname": "HighErrorRate",
      "severity": "critical",
      "service": "api"
    },
    "annotations": {
      "summary": "Error rate above 5%",
      "description": "API service is experiencing high error rates"
    },
    "startsAt": "2026-03-03T10:30:00Z",
    "fingerprint": "a1b2c3d4"
  }]
}

Features

  • Alert fingerprint used as unique ID for correlation
  • Supports alert grouping by labels
  • Auto-resolves incidents when alert status changes to resolved
See Grafana Integration for API token configuration.

Alert Correlation

Aurora uses multiple strategies to correlate related alerts:

1. Service-Based Correlation

Alerts affecting the same service are grouped:
# Check if alert matches existing incident by service
if alert_service == incident.alert_service:
    correlation_score = 0.9
    correlation_strategy = "service_match"

2. Time-Based Clustering

Alerts within a 5-minute window may be related:
time_diff = abs(alert_timestamp - incident.started_at)
if time_diff < timedelta(minutes=5):
    correlation_score = 0.7
    correlation_strategy = "time_cluster"

3. Semantic Similarity

Using Weaviate vector search for alert description similarity:
# server/chat/backend/agent/weaviate_client.py
results = weaviate_client.query(
    collection="Incidents",
    query_vector=embed(alert_description),
    limit=5,
    distance_threshold=0.3  # High similarity
)

if results:
    correlation_score = 1.0 - results[0].distance
    correlation_strategy = "semantic_similarity"
Correlated alerts are added to the incident_alerts table and displayed in the Correlated Alerts section of the incident detail page.

User Workflows

Configuring Webhooks

  1. Navigate to Connectors page in Aurora
  2. Click “Configure” next to the observability platform
  3. Copy the webhook URL for your platform
  4. Follow platform-specific setup instructions
  5. Send test webhook to verify connectivity
Webhook configuration page showing unique URL and test buttonWebhook configuration with unique URL per platform

Viewing Incidents from Observability Platforms

  1. Alert fires in Grafana/PagerDuty/etc.
  2. Webhook received by Aurora (within 1-2 seconds)
  3. Incident appears on Incidents page with status investigating
  4. Click incident to view:
    • Raw Alert: Original payload from platform
    • Thoughts: AI investigation progress
    • Suggestions: Diagnostic commands and fixes
    • Source Link: Deep link back to alerting platform

Manual Incident Creation

For platforms without native webhooks, manually create incidents via API:
curl -X POST https://your-aurora.com/api/incidents \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "custom",
    "alert_title": "Manual incident",
    "alert_service": "api",
    "severity": "high",
    "alert_metadata": {
      "description": "User-reported issue"
    }
  }'

Incident Lifecycle

From alert to resolution:

Status Transitions

From StatusTo StatusTrigger
-investigatingAlert webhook received, RCA started
investigatinganalyzedRCA completed, waiting for user action
analyzedresolvedUser marks incident as resolved
investigatingmergedAlert merged into another incident
resolvedresolvedPostmortem generated

Real-Time Updates

The Incidents page uses Server-Sent Events (SSE) for real-time updates:
// client/src/app/incidents/page.tsx
useEffect(() => {
  const eventSource = new EventSource('/api/incidents/stream');
  
  eventSource.onmessage = (event) => {
    const data = JSON.parse(event.data);
    if (data.type === 'incident_update') {
      refreshIncidents(true); // Silent refresh
    }
  };
  
  return () => eventSource.close();
}, []);
Backend sends SSE events when:
  • New incident created
  • Incident status changes
  • RCA completes
  • Suggestions added

API Reference

Webhook Endpoints

POST /api/webhooks/grafana
All webhook endpoints:
  • Accept JSON payloads
  • Return 202 Accepted on success
  • Validate payload structure
  • Rate-limited per platform

Get Platform Status

GET /api/{platform}/status
Returns connection status and configuration:
{
  "connected": true,
  "base_url": "https://app.datadoghq.com",
  "webhook_url": "https://aurora.example.com/api/webhooks/datadog",
  "last_webhook_received": "2026-03-03T10:30:00Z"
}

Incident Investigation

Automatic root cause analysis for incoming alerts

Cloud Integrations

Execute diagnostic commands during RCA investigations

Build docs developers (and LLMs) love