Observability dashboard

Drako provides full-stack observability for AI agent fleets out of the box. No Grafana setup. No Prometheus configuration. No external tooling to manage. Connect your agents and open the dashboard at getdrako.com/dashboard.

The full observability page (metrics, violations, alerts, drift detection) requires a Pro plan or above. The dashboard overview is available on all plans.

Architecture

Every agent run produces signals that flow from the SDK through the backend and into the dashboard in real time.

┌──────────────────────────────────────────────────────────────────────────┐
│                             YOUR AGENT FLEET                             │
│     [ agent-1 ]     [ agent-2 ]     [ agent-3 ]      [ agent-n ]         │
└──────────┬───────────────┬───────────────┬───────────────┬───────────────┘
           │               │               │               │
           ▼               ▼               ▼               ▼
┌──────────────────────────────────────────────────────────────────────────┐
│                      DRAKO SDK (pip install drako)                       │
│        Trust Evaluation  •  Policy Enforcement  •  Audit Logging         │
└──────────────────────────────┬───────────────────────────────────────────┘
                               │
                        HTTPS / WSS / mTLS
                               │
                               ▼
┌──────────────────────────────────────────────────────────────────────────┐
│                         DRAKO BACKEND (FastAPI)                          │
│                                                                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ Trust Engine │  │ Audit Chain  │  │ Policy Eng.  │  │   Metering   │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ Observ. Svc  │  │  FinOps Svc  │  │  Alert Svc   │  │  OTEL Exp.   │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘  │
│                                                                          │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │ Prometheus Metrics (/metrics) + Custom Business Instrumentation    │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌────────────┐    ┌────────────┐    ┌─────────────┐    ┌────────────┐   │
│  │  Postgres  │    │   Redis    │    │Grafana Alloy│    │  WS Hub    │   │
│  │   (RLS)    │    │  (Cache)   │    │ (Collector) │    │(Real-time) │   │
│  └────────────┘    └────────────┘    └─────────────┘    └────────────┘   │
└──────────────────────────────┬───────────────────────────────────────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────────────────┐
│                         DRAKO DASHBOARD (React)                          │
│                          getdrako.com/dashboard                          │
│                                                                          │
│  ┌────────────┐    ┌─────────────┐    ┌───────────┐    ┌─────────────┐   │
│  │  Overview  │    │Observability│    │  FinOps   │    │Agents/Audit │   │
│  │ (Cmd Ctr)  │    │    (Pro)    │    │   (Pro)   │    │ (All Plans) │   │
│  └────────────┘    └─────────────┘    └───────────┘    └─────────────┘   │
└──────────────────────────────────────────────────────────────────────────┘

Data flow:

Your agents call the Drako SDK for trust evaluation, audit logging, and policy checks.
The backend processes each request, tracks metrics in Postgres/Redis, and emits Prometheus counters.
Grafana Alloy scrapes /metrics every 30 seconds and pushes to Grafana Cloud.
The dashboard fetches aggregated data via REST API and receives live updates via WebSocket.
All data is tenant-isolated via PostgreSQL Row-Level Security.

Dashboard overview

The command center at /dashboard gives you a real-time snapshot of your governance posture.

Metric cards

Metric	Description
Audit entries	Total audit log entries in the current period
Agents verified	Number of agents with completed trust evaluation
Policy blocks	Actions blocked by governance policies
Avg trust score	Fleet-wide average trust score (0.0–1.0)

Each card includes a sparkline showing the 7-day trend.

Quota usage

A horizontal progress bar shows your current plan usage. The bar turns yellow at 70% and red above 90%.

Governance score trend

A 30-day time-series chart of your governance score progression, sourced from GET /dashboard/score-progression.

Tool health grid

A visual grid of your tools’ circuit breaker states:

Green (CLOSED) — Tool is healthy and operating normally
Yellow (HALF_OPEN) — Tool is recovering; limited traffic allowed
Red (OPEN) — Tool is circuit-broken; requests are being rejected

Activity feed

Real-time stream of the latest audit log entries with auto-refresh every 30 seconds, connected to the WebSocket for instant updates.

Key metrics

Health grade

A–F composite score combining latency, error rate, and governance overhead. Sources: GET /observability/insights/health.

Latency

P50, P95, and P99 percentiles with time-series visualization. Sources: GET /observability/metrics. Updated every 30 seconds.

Violation heatmap

A 7×24 grid (days × hours) where cell intensity represents violation count. Reveals patterns like batch-job spikes at 2 AM.

Drift detection

Automatic identification of behavioral drift across your fleet. When an agent’s behavior deviates significantly from its historical pattern, drift is flagged. Sources: GET /observability/insights/drift. Updated every 5 minutes.

Observability page

The full observability page at /observability is organized into four tabs: Overview, Metrics, Violations, and Alerts.

Overview
Metrics
Violations
Alerts

Unified health assessment combining multiple signals:

Component	What it measures
Health grade	A–F composite grade: latency + error rate + governance overhead
P50 latency	Median request latency across all endpoints
P95 latency	95th percentile latency (tail performance)
Active alerts	Number of currently firing alert rules
Drift status	Whether behavioral drift has been detected in the fleet

Configurable alert rules with 9 available business metrics:

Metric	Example threshold
`fleet_health`	< 0.7
`drift_rate`	> 0.3
`violations_24h`	> 100
`cost_today_usd`	> 50
`avg_latency_ms`	> 2000
`error_rate`	> 0.05
`quota_usage_pct`	> 0.9
`active_agents`	< 1
`governance_overhead_pct`	> 0.15

Alert channels: log, Slack, email, PagerDuty. Rules can be test-fired before going live.

Configuring alert rules

Define alert rules in .drako.yaml. Each rule specifies a metric, a threshold condition, and one or more notification channels.

policies:
  alerts:
    rules:
      - name: high_latency
        metric: avg_latency_ms
        condition: "> 2000"
        channels: [slack, pagerduty]
      - name: drift_detected
        metric: drift_rate
        condition: "> 0.3"
        channels: [email]
      - name: budget_warning
        metric: cost_today_usd
        condition: "> 50"
        channels: [slack]
      - name: low_fleet_health
        metric: fleet_health
        condition: "< 0.7"
        channels: [pagerduty]

Session traces

Every agent session produces a full span tree — tool calls, policy checks, latency breakdowns, and audit references. Session traces are accessible from the Agents page and link directly to the corresponding audit log entries.

Exporting to external systems

OTEL export (Datadog, Grafana, New Relic)

Drako supports OpenTelemetry export. Pipe traces and metrics to your existing observability stack:

Datadog — traces via OTLP exporter
Grafana — metrics via Grafana Alloy already scraped from /metrics
New Relic — traces via OTLP exporter

Configure the OTEL endpoint in your environment:

export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net
export OTEL_EXPORTER_OTLP_HEADERS="Api-Key=YOUR_LICENSE_KEY"

SIEM export (Splunk, ELK)

Security events are exportable to SIEM platforms via STIX 2.1 or CEF format:

Splunk — ingest via HTTP Event Collector using CEF-formatted events
ELK (Elasticsearch) — ingest via Logstash pipeline using STIX 2.1 bundles

Export is available for all violation and policy-block events from the audit trail.

Plan availability

Feature	Free	Starter	Pro	Enterprise
Dashboard overview	Yes	Yes	Yes	Yes
Audit trail	7 days	30 days	90 days	Custom
Agent trust scores	Yes	Yes	Yes	Yes
Governance score trend	—	Yes	Yes	Yes
Tool health grid	—	—	Yes	Yes
Observability (full)	—	—	Yes	Yes
Alert rules	—	—	Yes	Yes
Violation heatmap	—	—	Yes	Yes
Drift detection	—	—	Yes	Yes
OTEL export	—	—	Yes	Yes
Custom metrics	—	—	—	Yes

Real-time updates

The dashboard connects to a WebSocket at wss://api.getdrako.com/ws for live updates. The connection indicator in the header shows:

Green dot (pulsing) — Connected, receiving live data
Yellow dot — Reconnecting
No dot — Disconnected (data still refreshes every 30 seconds via polling)

Reconnection is automatic with exponential backoff (up to 5 retries).

Get Started

Scanning

Runtime Enforcement

Configuration

Observability & Compliance

Integrations

Observability dashboard

Architecture

Dashboard overview

Metric cards

Quota usage

Governance score trend

Tool health grid

Activity feed

Key metrics

Health grade

Latency

Violation heatmap

Drift detection

Observability page

Configuring alert rules

Session traces

Exporting to external systems

Plan availability

Real-time updates

Build docs developers (and LLMs) love

Get Started

Scanning

Runtime Enforcement

Configuration

Observability & Compliance

Integrations

​Architecture

​Dashboard overview

​Metric cards

​Quota usage

​Governance score trend

​Tool health grid

​Activity feed

​Key metrics

Health grade

Latency

Violation heatmap

Drift detection

​Observability page

​Configuring alert rules

​Session traces

​Exporting to external systems

​Plan availability

​Real-time updates

Build docs developers (and LLMs) love

Architecture

Dashboard overview

Metric cards

Quota usage

Governance score trend

Tool health grid

Activity feed

Key metrics

Observability page

Configuring alert rules

Session traces

Exporting to external systems

Plan availability

Real-time updates