LLM Analytics

LLM Analytics provides observability for AI applications. Track traces, analyze conversations, run automated evaluations, and monitor model performance across providers.

Overview

LLM Analytics offers:

Trace collection - Capture LLM calls, tool usage, and conversation flows
Automated evaluations - LLM-as-judge for quality, hallucinations, and toxicity
Provider proxy - Route requests through PostHog with provider key management
Clustering - Group similar traces to identify patterns
Cost tracking - Monitor token usage and costs across providers

Getting Started

Install SDK

pip install posthog

Capture LLM Traces

import posthog
from openai import OpenAI

posthog.api_key = '<ph_project_api_key>'
client = OpenAI()

with posthog.trace("chat_completion") as trace:
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is PostHog?"}
        ]
    )
    
    # Log the generation
    trace.generation(
        model="gpt-4",
        input=messages,
        output=response.choices[0].message.content,
        usage={
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
            "total_tokens": response.usage.total_tokens
        }
    )

Track Tool Usage

Capture function calls and tool usage:

with posthog.trace("agent_workflow") as trace:
    # User message
    trace.message(
        role="user",
        content="What's the weather in San Francisco?"
    )
    
    # Tool call
    trace.tool_call(
        name="get_weather",
        arguments={"location": "San Francisco"},
        result={"temperature": 72, "condition": "sunny"}
    )
    
    # Assistant response
    trace.generation(
        model="gpt-4",
        input="Generate response with weather data",
        output="It's currently 72°F and sunny in San Francisco."
    )

Automated Evaluations

Run LLM-as-judge evaluations on your traces:

Creating an Evaluation

Define evaluation criteria

evaluation = {
    "name": "Response Quality",
    "evaluation_type": "llm_as_judge",
    "evaluation_config": {
        "prompt": """
            Evaluate the assistant's response on:
            1. Accuracy - Is the information correct?
            2. Helpfulness - Does it answer the user's question?
            3. Tone - Is it professional and friendly?
            
            Rate from 1-5 and explain your reasoning.
        """,
        "model": "gpt-4"
    },
    "output_type": "numeric_score",
    "conditions": [
        {
            "properties": [
                {
                    "key": "trace_type",
                    "value": "chat_completion",
                    "operator": "exact"
                }
            ]
        }
    ]
}

Enable evaluation

POST /api/projects/{project_id}/llm_analytics/evaluations/

{
  "name": "Response Quality",
  "enabled": true,
  "evaluation_type": "llm_as_judge",
  "evaluation_config": {...},
  "output_type": "numeric_score"
}

View results

Evaluations run automatically on new traces. View scores in the trace detail view.

Evaluation Types

{
  "evaluation_type": "hallucination",
  "evaluation_config": {
    "prompt": "Does the response contain information not present in the context?",
    "context_field": "event.properties.context"
  },
  "output_type": "boolean"
}

Model Configuration

Configure evaluation models:

POST /api/projects/{project_id}/llm_analytics/model_configurations/

{
  "provider": "openai",
  "model": "gpt-4",
  "temperature": 0.2,
  "max_tokens": 1000
}

Supported providers:

OpenAI (GPT-3.5, GPT-4)
Anthropic (Claude)
Google (Gemini)
OpenRouter (multiple models)
Fireworks AI

Clustering

Automatically group similar traces:

POST /api/projects/{project_id}/llm_analytics/clustering_configs/

{
  "name": "User Intent Clustering",
  "model_configuration_id": "config_123",
  "min_cluster_size": 5,
  "filters": {
    "properties": [
      {"key": "trace_type", "value": "user_query"}
    ]
  }
}

View clusters:

GET /api/projects/{project_id}/llm_analytics/clusters/

{
  "clusters": [
    {
      "id": "cluster_1",
      "description": "Users asking about pricing",
      "size": 342,
      "sample_traces": [...]
    },
    {
      "id": "cluster_2",
      "description": "Technical support questions",
      "size": 198,
      "sample_traces": [...]
    }
  ]
}

Provider Proxy

Route LLM requests through PostHog for unified tracking:

Setup Provider Keys

POST /api/projects/{project_id}/llm_analytics/provider_keys/

{
  "provider": "openai",
  "api_key": "sk-..."
}

Use Proxy

from openai import OpenAI

client = OpenAI(
    base_url="https://app.posthog.com/api/llm_proxy/v1",
    api_key="<ph_project_api_key>"
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)
# Automatically tracked in PostHog

Benefits

Automatic tracing - No manual instrumentation needed
Cost tracking - Monitor spend across providers
Provider fallback - Automatic failover to backup providers
Rate limiting - Shared rate limit management

Datasets

Create datasets for offline evaluation:

POST /api/projects/{project_id}/llm_analytics/datasets/

{
  "name": "Customer Support QA",
  "description": "Curated set of support questions",
  "items": [
    {
      "input": "How do I reset my password?",
      "expected_output": "Click 'Forgot Password' on the login page...",
      "metadata": {"category": "authentication"}
    }
  ]
}

Run evaluations on datasets:

POST /api/projects/{project_id}/llm_analytics/evaluation_runs/

{
  "dataset_id": "dataset_123",
  "evaluation_id": "eval_456",
  "model_configuration_id": "config_789"
}

Metrics and Monitoring

Trace Metrics

Track key metrics:

GET /api/projects/{project_id}/llm_analytics/metrics/

{
  "period": "7d",
  "metrics": {
    "total_traces": 15234,
    "total_tokens": 4523412,
    "total_cost": 67.89,
    "avg_latency_ms": 1234,
    "error_rate": 0.023,
    "p95_latency_ms": 2500
  },
  "by_model": {
    "gpt-4": {"traces": 5000, "cost": 45.20},
    "gpt-3.5-turbo": {"traces": 10234, "cost": 22.69}
  }
}

Error Tracking

Monitor errors and failures:

GET /api/projects/{project_id}/llm_analytics/errors/

{
  "errors": [
    {
      "error_type": "RateLimitError",
      "count": 23,
      "last_seen": "2024-01-15T10:30:00Z",
      "provider": "openai"
    },
    {
      "error_type": "TimeoutError",
      "count": 12,
      "provider": "anthropic"
    }
  ]
}

Latency Tracking

Prometheus metrics for latency:

# llma_request_duration_seconds
# Histogram with p50, p95, p99 percentiles

# llma_llm_call_duration_seconds  
# Histogram by provider

# llma_errors_total
# Counter by error type

Sentiment Analysis

Analyze conversation sentiment:

POST /api/projects/{project_id}/llm_analytics/sentiment/

{
  "trace_id": "trace_123",
  "text": "The assistant was very helpful and solved my problem!"
}

Response:
{
  "sentiment": "positive",
  "score": 0.92,
  "confidence": 0.95
}

Summarization

Generate trace summaries:

POST /api/projects/{project_id}/llm_analytics/summarization/

{
  "trace_id": "trace_123",
  "model": "gpt-4"
}

Response:
{
  "summary": "User asked about pricing. Assistant explained tier structure and provided discount code.",
  "key_points": [
    "Pricing inquiry",
    "Tier explanation provided",
    "Discount code shared"
  ]
}

Best Practices

Structured Traces

Use consistent trace naming and include relevant metadata. This makes filtering, clustering, and analysis much more effective.

Evaluation Coverage

Start with a few key evaluations (quality, hallucination) rather than many. Refine based on actual issues you discover.

Cost Monitoring

Set up alerts for unusual cost spikes. Track cost per trace to identify expensive patterns early.

Provider Keys

Use the proxy with BYOK (bring your own keys) for production. This gives you automatic tracing without SDK changes.

API Reference

Create Trace

POST /api/projects/{project_id}/llm_analytics/traces/

{
  "trace_id": "unique_trace_id",
  "name": "chat_completion",
  "metadata": {
    "user_id": "user_123",
    "session_id": "session_456"
  },
  "events": [
    {
      "type": "generation",
      "model": "gpt-4",
      "input": [...],
      "output": "...",
      "usage": {...}
    }
  ]
}

Get Trace

GET /api/projects/{project_id}/llm_analytics/traces/{trace_id}/

Response:
{
  "id": "trace_123",
  "name": "chat_completion",
  "events": [...],
  "evaluations": [...],
  "metrics": {...}
}

List Evaluations

GET /api/projects/{project_id}/llm_analytics/evaluations/

Response:
{
  "results": [
    {
      "id": "eval_123",
      "name": "Response Quality",
      "enabled": true,
      "evaluation_type": "llm_as_judge"
    }
  ]
}

Get Started

Core Products

Data & Infrastructure

Integration

Configuration

Overview

Getting Started

Install SDK

Capture LLM Traces

Track Tool Usage

Automated Evaluations

Creating an Evaluation

Evaluation Types

Model Configuration

Clustering

Provider Proxy

Setup Provider Keys

Use Proxy

Benefits

Datasets

Metrics and Monitoring

Trace Metrics

Error Tracking

Latency Tracking

Sentiment Analysis

Summarization

Best Practices

Structured Traces

Evaluation Coverage

Cost Monitoring

Provider Keys

API Reference

Create Trace

Get Trace

List Evaluations

Build docs developers (and LLMs) love

Get Started

Core Products

Data & Infrastructure

Integration

Configuration

​Overview

​Getting Started

​Install SDK

​Capture LLM Traces

​Track Tool Usage

​Automated Evaluations

​Creating an Evaluation

​Evaluation Types

​Model Configuration

​Clustering

​Provider Proxy

​Setup Provider Keys

​Use Proxy

​Benefits

​Datasets

​Metrics and Monitoring

​Trace Metrics

​Error Tracking

​Latency Tracking

​Sentiment Analysis

​Summarization

​Best Practices

Structured Traces

Evaluation Coverage

Cost Monitoring

Provider Keys

​API Reference

​Create Trace

​Get Trace

​List Evaluations

Build docs developers (and LLMs) love

Overview

Getting Started

Install SDK

Capture LLM Traces

Track Tool Usage

Automated Evaluations

Creating an Evaluation

Evaluation Types

Model Configuration

Clustering

Provider Proxy

Setup Provider Keys

Use Proxy

Benefits

Datasets

Metrics and Monitoring

Trace Metrics

Error Tracking

Latency Tracking

Sentiment Analysis

Summarization

Best Practices

API Reference

Create Trace

Get Trace

List Evaluations