Telemetry and Observability

Arcana emits telemetry events for all operations, giving you visibility into performance, errors, and usage patterns.

Quick Start

For immediate visibility, attach the built-in logger:

lib/my_app/application.ex

def start(_type, _args) do
  # Attach telemetry logger before starting supervision tree
  Arcana.Telemetry.Logger.attach()

  children = [
    MyApp.Repo,
    Arcana.TaskSupervisor,
    Arcana.Embedder.Local
  ]

  Supervisor.start_link(children, strategy: :one_for_one)
end

This logs all Arcana operations with timing:

[info] [Arcana] search completed in 42ms (15 results)
[info] [Arcana] llm.complete completed in 1.23s [openai:gpt-4o-mini] ok (156 chars)
[info] [Arcana] agent.gate completed in 180ms (skip_retrieval: false)
[info] [Arcana] agent.expand completed in 2.15s ("machine learning ML models...")
[info] [Arcana] agent.search completed in 156ms (25 chunks)
[info] [Arcana] agent.rerank completed in 312ms (10/25 kept)
[info] [Arcana] agent.answer completed in 3.25s
[info] [Arcana] ask completed in 6.12s

With graph: true enabled:

[info] [Arcana] graph.ner completed in 45ms (3 entities)
[info] [Arcana] graph.relationship_extraction completed in 1.2s (2 relationships)
[info] [Arcana] graph.build completed in 1.5s (5 entities, 3 relationships)
[info] [Arcana] graph.search completed in 28ms (8 graph results, 10 combined)

Logger Options

Arcana.Telemetry.Logger.attach(
  level: :debug,           # Log level (default: :info)
  handler_id: "my-logger"  # Custom handler ID
)

# To stop logging
Arcana.Telemetry.Logger.detach()

Event Reference

All events use :telemetry.span/3, which emits :start, :stop, and :exception variants automatically.

Core Events

Event	Measurements	Metadata
`[:arcana, :ingest, :*]`	`system_time`, `duration`	`text`, `repo`, `collection`, `document`, `chunk_count`
`[:arcana, :search, :*]`	`system_time`, `duration`	`query`, `repo`, `mode`, `limit`, `results`, `result_count`
`[:arcana, :ask, :*]`	`system_time`, `duration`	`question`, `repo`, `answer`, `context_count`
`[:arcana, :embed, :*]`	`system_time`, `duration`	`text`, `dimensions`
`[:arcana, :llm, :complete, :*]`	`system_time`, `duration`	`model`, `prompt_length`, `success`, `response_length`, `error`

Agent Pipeline Events

Each step in the Agent pipeline emits its own events:

Event	Metadata
`[:arcana, :agent, :gate, :*]`	`question`, `skip_retrieval`
`[:arcana, :agent, :rewrite, :*]`	`question`, `rewritten_query`
`[:arcana, :agent, :select, :*]`	`selected` (collections)
`[:arcana, :agent, :expand, :*]`	`question`, `expanded_query`
`[:arcana, :agent, :decompose, :*]`	`question`, `sub_question_count`
`[:arcana, :agent, :search, :*]`	`question`, `total_chunks`
`[:arcana, :agent, :reason, :*]`	`question`, `iterations`
`[:arcana, :agent, :rerank, :*]`	`question`, `chunks_before`, `chunks_after`
`[:arcana, :agent, :answer, :*]`	`question`, `context_chunk_count`

GraphRAG Events

When using graph: true, these events track knowledge graph operations:

Event	Metadata
`[:arcana, :graph, :build, :*]`	`chunk_count`, `collection`, `entity_count`, `relationship_count`
`[:arcana, :graph, :search, :*]`	`query`, `entity_count`, `graph_result_count`, `combined_count`
`[:arcana, :graph, :ner, :*]`	`text`, `entity_count`
`[:arcana, :graph, :relationship_extraction, :*]`	`text`, `relationship_count`
`[:arcana, :graph, :community_detection, :*]`	`entity_count`, `community_count`
`[:arcana, :graph, :community_summary, :*]`	`entity_count`, `summary_length`

VectorStore and GraphStore Events

Storage layer events for vector and graph operations:

VectorStore
GraphStore

Event	Metadata
`[:arcana, :vector_store, :store, :*]`	`collection`, `id`, `backend`
`[:arcana, :vector_store, :search, :*]`	`collection`, `limit`, `backend`, `result_count`
`[:arcana, :vector_store, :search_text, :*]`	`collection`, `query`, `limit`, `backend`, `result_count`
`[:arcana, :vector_store, :delete, :*]`	`collection`, `id`, `backend`
`[:arcana, :vector_store, :clear, :*]`	`collection`, `backend`

Event	Metadata
`[:arcana, :graph_store, :persist_entities, :*]`	`collection_id`, `entity_count`, `backend`
`[:arcana, :graph_store, :persist_relationships, :*]`	`relationship_count`, `backend`
`[:arcana, :graph_store, :persist_mentions, :*]`	`mention_count`, `backend`
`[:arcana, :graph_store, :search, :*]`	`entity_count`, `backend`, `result_count`
`[:arcana, :graph_store, :delete_by_chunks, :*]`	`chunk_count`, `backend`
`[:arcana, :graph_store, :delete_by_collection, :*]`	`collection_id`, `backend`

Exception Events

All :exception events include:

kind - The exception type (:error, :exit, :throw)
reason - The exception or error term
stacktrace - Full stacktrace

Custom Handlers

For more control, attach your own handlers:

lib/my_app/arcana_metrics.ex

defmodule MyApp.ArcanaMetrics do
  require Logger

  def setup do
    events = [
      [:arcana, :ingest, :stop],
      [:arcana, :search, :stop],
      [:arcana, :ask, :stop],
      [:arcana, :llm, :complete, :stop],
      # Agent pipeline
      [:arcana, :agent, :rerank, :stop],
      [:arcana, :agent, :answer, :stop],
      # GraphRAG
      [:arcana, :graph, :build, :stop],
      [:arcana, :graph, :search, :stop]
    ]

    :telemetry.attach_many("my-arcana-metrics", events, &handle_event/4, nil)
  end

  def handle_event([:arcana, :search, :stop], measurements, metadata, _config) do
    duration_ms = System.convert_time_unit(measurements.duration, :native, :millisecond)

    Logger.info("Search: #{metadata.result_count} results in #{duration_ms}ms",
      query: metadata.query,
      mode: metadata.mode
    )
  end

  def handle_event([:arcana, :llm, :complete, :stop], measurements, metadata, _config) do
    duration_ms = System.convert_time_unit(measurements.duration, :native, :millisecond)

    if metadata.success do
      Logger.debug("LLM call to #{metadata.model}: #{duration_ms}ms")
    else
      Logger.warning("LLM call failed: #{metadata.error}")
    end
  end

  # ... handle other events
end

Call MyApp.ArcanaMetrics.setup() in your application startup.

Phoenix LiveDashboard Integration

Add Arcana metrics to your LiveDashboard:

lib/my_app/telemetry.ex

defmodule MyApp.Telemetry do
  import Telemetry.Metrics

  def metrics do
    [
      # Core operations
      summary("arcana.search.stop.duration",
        unit: {:native, :millisecond},
        tags: [:mode]
      ),
      counter("arcana.search.stop.result_count"),
      
      # LLM calls
      summary("arcana.llm.complete.stop.duration",
        unit: {:native, :millisecond},
        tags: [:model]
      ),
      
      # Agent pipeline
      summary("arcana.agent.rerank.stop.duration",
        unit: {:native, :millisecond}
      ),
      last_value("arcana.agent.rerank.stop.kept"),
      
      # GraphRAG
      summary("arcana.graph.build.stop.duration",
        unit: {:native, :millisecond}
      ),
      last_value("arcana.graph.build.stop.entity_count")
    ]
  end
end

Configure LiveDashboard:

lib/my_app_web/router.ex

live_dashboard "/dashboard",
  metrics: MyApp.Telemetry

Prometheus Integration

For production monitoring with Prometheus, use prom_ex:

mix.exs

{:prom_ex, "~> 1.9"}

lib/my_app/prom_ex/arcana_plugin.ex

defmodule MyApp.PromEx.ArcanaPlugin do
  use PromEx.Plugin

  @impl true
  def event_metrics(_opts) do
    Event.build(
      :arcana_event_metrics,
      [
        distribution(
          [:arcana, :search, :duration, :milliseconds],
          event_name: [:arcana, :search, :stop],
          measurement: :duration,
          unit: {:native, :millisecond},
          tags: [:mode],
          tag_values: fn metadata -> %{mode: metadata[:mode] || :semantic} end
        ),
        distribution(
          [:arcana, :llm, :complete, :duration, :milliseconds],
          event_name: [:arcana, :llm, :complete, :stop],
          measurement: :duration,
          unit: {:native, :millisecond},
          tags: [:model, :success],
          tag_values: fn metadata ->
            %{model: metadata[:model] || "unknown", success: metadata[:success]}
          end
        ),
        counter(
          [:arcana, :ingest, :chunks, :total],
          event_name: [:arcana, :ingest, :stop],
          measurement: fn _measurements, metadata -> metadata[:chunk_count] || 0 end
        )
      ]
    )
  end
end

Debugging Performance Issues

Identify Slow Operations

The built-in logger makes it easy to spot bottlenecks:

[info] [Arcana] embed completed in 45ms (384 dims)
[info] [Arcana] search completed in 12ms (10 results)
[info] [Arcana] llm.complete completed in 3.2s [openai:gpt-4o] ok (1892 chars)
[info] [Arcana] ask completed in 3.3s

In this example, the LLM call dominates total time (3.2s of 3.3s).

Track Agent Pipeline Steps

For agentic RAG, each pipeline step is instrumented:

[info] [Arcana] agent.rerank completed in 890ms (8/25 kept)
[info] [Arcana] agent.reason completed in 850ms (1 iteration)

Reranking too slow?

If reranking takes too long, consider:

Reducing chunks before reranking (lower search limit)
Using a faster reranking threshold
Implementing a custom reranker

Too many reason iterations?

If reason/2 is taking too long:

Lower max_iterations (default: 2)
Improve initial search quality with query expansion

Monitor LLM Costs

Track prompt sizes to estimate API costs:

def handle_event([:arcana, :llm, :complete, :stop], measurements, metadata, _config) do
  # Rough token estimate (4 chars per token)
  prompt_tokens = div(metadata.prompt_length || 0, 4)
  response_tokens = div(metadata.response_length || 0, 4)

  Logger.info("LLM usage",
    model: metadata.model,
    prompt_tokens: prompt_tokens,
    response_tokens: response_tokens,
    success: metadata.success
  )
end

Error Tracking

Handle exceptions to send to your error tracking service:

def setup do
  exception_events = [
    [:arcana, :ingest, :exception],
    [:arcana, :search, :exception],
    [:arcana, :ask, :exception],
    [:arcana, :llm, :complete, :exception]
  ]

  :telemetry.attach_many("arcana-errors", exception_events, &handle_exception/4, nil)
end

def handle_exception(event, _measurements, metadata, _config) do
  # Send to Sentry, Honeybadger, etc.
  Sentry.capture_message("Arcana error",
    extra: %{
      event: inspect(event),
      kind: metadata.kind,
      reason: inspect(metadata.reason)
    }
  )
end

Best Practices

Start with built-in logger

Zero-config logging helps you understand what’s happening

Focus on LLM latency

LLM calls are usually the bottleneck - track them closely

Monitor reranking

Watch the kept/original ratio when using Agent.rerank/2

Tag by collection

Identify slow document sets by tagging metrics with collection names

Set up alerts

Alert on LLM failures and unusually slow operations

Log in production

Keep at least :info level logging for debugging issues

Next Steps

Evaluation

Measure retrieval quality alongside performance

Dashboard

View operations and results in the web UI

Agentic RAG

Monitor complex pipeline steps

GraphRAG

Track entity extraction and graph building

Getting Started

Core Concepts

Guides

Configuration

Telemetry and Observability

Quick Start

Logger Options

Event Reference

Core Events

Agent Pipeline Events

GraphRAG Events

VectorStore and GraphStore Events

Exception Events

Custom Handlers

Phoenix LiveDashboard Integration

Prometheus Integration

Debugging Performance Issues

Identify Slow Operations

Track Agent Pipeline Steps

Monitor LLM Costs

Error Tracking

Best Practices

Start with built-in logger

Focus on LLM latency

Monitor reranking

Tag by collection

Set up alerts

Log in production

Next Steps

Evaluation

Dashboard

Agentic RAG

GraphRAG

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

Configuration

​Quick Start

​Logger Options

​Event Reference

​Core Events

​Agent Pipeline Events

​GraphRAG Events

​VectorStore and GraphStore Events

​Exception Events

​Custom Handlers

​Phoenix LiveDashboard Integration

​Prometheus Integration

​Debugging Performance Issues

​Identify Slow Operations

​Track Agent Pipeline Steps

​Monitor LLM Costs

​Error Tracking

​Best Practices

Start with built-in logger

Focus on LLM latency

Monitor reranking

Tag by collection

Set up alerts

Log in production

​Next Steps

Evaluation

Dashboard

Agentic RAG

GraphRAG

Build docs developers (and LLMs) love

Quick Start

Logger Options

Event Reference

Core Events

Agent Pipeline Events

GraphRAG Events

VectorStore and GraphStore Events

Exception Events

Custom Handlers

Phoenix LiveDashboard Integration

Prometheus Integration

Debugging Performance Issues

Identify Slow Operations

Track Agent Pipeline Steps

Monitor LLM Costs

Error Tracking

Best Practices

Next Steps