Skip to main content
Drako provides full-stack observability for AI agent fleets out of the box. No Grafana setup. No Prometheus configuration. No external tooling to manage. Connect your agents and open the dashboard at getdrako.com/dashboard.
The full observability page (metrics, violations, alerts, drift detection) requires a Pro plan or above. The dashboard overview is available on all plans.

Architecture

Every agent run produces signals that flow from the SDK through the backend and into the dashboard in real time.
┌──────────────────────────────────────────────────────────────────────────┐
│                             YOUR AGENT FLEET                             │
│     [ agent-1 ]     [ agent-2 ]     [ agent-3 ]      [ agent-n ]         │
└──────────┬───────────────┬───────────────┬───────────────┬───────────────┘
           │               │               │               │
           ▼               ▼               ▼               ▼
┌──────────────────────────────────────────────────────────────────────────┐
│                      DRAKO SDK (pip install drako)                       │
│        Trust Evaluation  •  Policy Enforcement  •  Audit Logging         │
└──────────────────────────────┬───────────────────────────────────────────┘

                        HTTPS / WSS / mTLS


┌──────────────────────────────────────────────────────────────────────────┐
│                         DRAKO BACKEND (FastAPI)                          │
│                                                                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ Trust Engine │  │ Audit Chain  │  │ Policy Eng.  │  │   Metering   │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ Observ. Svc  │  │  FinOps Svc  │  │  Alert Svc   │  │  OTEL Exp.   │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘  │
│                                                                          │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │ Prometheus Metrics (/metrics) + Custom Business Instrumentation    │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌────────────┐    ┌────────────┐    ┌─────────────┐    ┌────────────┐   │
│  │  Postgres  │    │   Redis    │    │Grafana Alloy│    │  WS Hub    │   │
│  │   (RLS)    │    │  (Cache)   │    │ (Collector) │    │(Real-time) │   │
│  └────────────┘    └────────────┘    └─────────────┘    └────────────┘   │
└──────────────────────────────┬───────────────────────────────────────────┘


┌──────────────────────────────────────────────────────────────────────────┐
│                         DRAKO DASHBOARD (React)                          │
│                          getdrako.com/dashboard                          │
│                                                                          │
│  ┌────────────┐    ┌─────────────┐    ┌───────────┐    ┌─────────────┐   │
│  │  Overview  │    │Observability│    │  FinOps   │    │Agents/Audit │   │
│  │ (Cmd Ctr)  │    │    (Pro)    │    │   (Pro)   │    │ (All Plans) │   │
│  └────────────┘    └─────────────┘    └───────────┘    └─────────────┘   │
└──────────────────────────────────────────────────────────────────────────┘
Data flow:
  1. Your agents call the Drako SDK for trust evaluation, audit logging, and policy checks.
  2. The backend processes each request, tracks metrics in Postgres/Redis, and emits Prometheus counters.
  3. Grafana Alloy scrapes /metrics every 30 seconds and pushes to Grafana Cloud.
  4. The dashboard fetches aggregated data via REST API and receives live updates via WebSocket.
  5. All data is tenant-isolated via PostgreSQL Row-Level Security.

Dashboard overview

The command center at /dashboard gives you a real-time snapshot of your governance posture.

Metric cards

MetricDescription
Audit entriesTotal audit log entries in the current period
Agents verifiedNumber of agents with completed trust evaluation
Policy blocksActions blocked by governance policies
Avg trust scoreFleet-wide average trust score (0.0–1.0)
Each card includes a sparkline showing the 7-day trend.

Quota usage

A horizontal progress bar shows your current plan usage. The bar turns yellow at 70% and red above 90%.

Governance score trend

A 30-day time-series chart of your governance score progression, sourced from GET /dashboard/score-progression.

Tool health grid

A visual grid of your tools’ circuit breaker states:
  • Green (CLOSED) — Tool is healthy and operating normally
  • Yellow (HALF_OPEN) — Tool is recovering; limited traffic allowed
  • Red (OPEN) — Tool is circuit-broken; requests are being rejected

Activity feed

Real-time stream of the latest audit log entries with auto-refresh every 30 seconds, connected to the WebSocket for instant updates.

Key metrics

Health grade

A–F composite score combining latency, error rate, and governance overhead. Sources: GET /observability/insights/health.

Latency

P50, P95, and P99 percentiles with time-series visualization. Sources: GET /observability/metrics. Updated every 30 seconds.

Violation heatmap

A 7×24 grid (days × hours) where cell intensity represents violation count. Reveals patterns like batch-job spikes at 2 AM.

Drift detection

Automatic identification of behavioral drift across your fleet. When an agent’s behavior deviates significantly from its historical pattern, drift is flagged. Sources: GET /observability/insights/drift. Updated every 5 minutes.

Observability page

The full observability page at /observability is organized into four tabs: Overview, Metrics, Violations, and Alerts.
Unified health assessment combining multiple signals:
ComponentWhat it measures
Health gradeA–F composite grade: latency + error rate + governance overhead
P50 latencyMedian request latency across all endpoints
P95 latency95th percentile latency (tail performance)
Active alertsNumber of currently firing alert rules
Drift statusWhether behavioral drift has been detected in the fleet

Configuring alert rules

Define alert rules in .drako.yaml. Each rule specifies a metric, a threshold condition, and one or more notification channels.
policies:
  alerts:
    rules:
      - name: high_latency
        metric: avg_latency_ms
        condition: "> 2000"
        channels: [slack, pagerduty]
      - name: drift_detected
        metric: drift_rate
        condition: "> 0.3"
        channels: [email]
      - name: budget_warning
        metric: cost_today_usd
        condition: "> 50"
        channels: [slack]
      - name: low_fleet_health
        metric: fleet_health
        condition: "< 0.7"
        channels: [pagerduty]

Session traces

Every agent session produces a full span tree — tool calls, policy checks, latency breakdowns, and audit references. Session traces are accessible from the Agents page and link directly to the corresponding audit log entries.

Exporting to external systems

Drako supports OpenTelemetry export. Pipe traces and metrics to your existing observability stack:
  • Datadog — traces via OTLP exporter
  • Grafana — metrics via Grafana Alloy already scraped from /metrics
  • New Relic — traces via OTLP exporter
Configure the OTEL endpoint in your environment:
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net
export OTEL_EXPORTER_OTLP_HEADERS="Api-Key=YOUR_LICENSE_KEY"
Security events are exportable to SIEM platforms via STIX 2.1 or CEF format:
  • Splunk — ingest via HTTP Event Collector using CEF-formatted events
  • ELK (Elasticsearch) — ingest via Logstash pipeline using STIX 2.1 bundles
Export is available for all violation and policy-block events from the audit trail.

Plan availability

FeatureFreeStarterProEnterprise
Dashboard overviewYesYesYesYes
Audit trail7 days30 days90 daysCustom
Agent trust scoresYesYesYesYes
Governance score trendYesYesYes
Tool health gridYesYes
Observability (full)YesYes
Alert rulesYesYes
Violation heatmapYesYes
Drift detectionYesYes
OTEL exportYesYes
Custom metricsYes

Real-time updates

The dashboard connects to a WebSocket at wss://api.getdrako.com/ws for live updates. The connection indicator in the header shows:
  • Green dot (pulsing) — Connected, receiving live data
  • Yellow dot — Reconnecting
  • No dot — Disconnected (data still refreshes every 30 seconds via polling)
Reconnection is automatic with exponential backoff (up to 5 retries).

Build docs developers (and LLMs) love