Track and Optimize LLM Costs

Learn how to track costs across users, features, and environments to understand your AI application’s unit economics and identify optimization opportunities.

What You’ll Learn

How to:

Track costs per user and feature
Set up cost alerts before budget overruns
Enable caching to reduce redundant API costs
Analyze cost trends over time

Prerequisites

Helicone API key (get one here)
An LLM application with API calls
5 minutes to implement tracking

Step 1: Add Cost Tracking Headers

Start by tagging your requests with metadata for cost segmentation.

import { OpenAI } from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://oai.helicone.ai/v1",
  defaultHeaders: {
    "Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
  },
});

// Track cost by user and feature
const response = await client.chat.completions.create(
  {
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: "Hello!" }],
  },
  {
    headers: {
      "Helicone-User-Id": "user-123",
      "Helicone-Property-Feature": "chat",
      "Helicone-Property-Environment": "production",
      "Helicone-Property-UserTier": "premium",
    },
  }
);

Key Headers:

Helicone-User-Id: Track costs per user for unit economics
Helicone-Property-Feature: Identify which features drive costs
Helicone-Property-Environment: Separate dev/staging/production costs
Helicone-Property-UserTier: Compare free vs. paid user costs

Step 2: Organize Multi-Step Workflows

For complex workflows (like AI agents), use sessions to track the total cost of completing a task.

import { randomUUID } from "crypto";

const sessionId = randomUUID();

// Initial question
await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "Summarize this document..." }],
  },
  {
    headers: {
      "Helicone-Session-Id": sessionId,
      "Helicone-Session-Name": "Document Analysis",
      "Helicone-Session-Path": "/analyze",
      "Helicone-User-Id": "user-123",
      "Helicone-Property-Feature": "document-analysis",
    },
  }
);

// Follow-up analysis
await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "Extract key points..." }],
  },
  {
    headers: {
      "Helicone-Session-Id": sessionId, // Same session ID
      "Helicone-Session-Name": "Document Analysis",
      "Helicone-Session-Path": "/analyze/extract",
      "Helicone-User-Id": "user-123",
      "Helicone-Property-Feature": "document-analysis",
    },
  }
);

Sessions show the total cost of completing a task. This reveals insights like “document analysis costs $0.45 on average” rather than seeing individual API calls.

Step 3: View Cost Analytics

Dashboard Overview

Navigate to your Helicone dashboard to see:

Total costs (today, this week, this month)
Cost trends over time
Top cost-driving models and features
Cost per user breakdown

Filter by Properties

Use the filters to segment costs:

Filter by Property: Feature = "document-analysis"
Result: $127 spent on document analysis this week

Filter by Property: Environment = "development"
Result: $43 spent on development testing

Filter by Property: UserTier = "premium"
Result: Premium users generate $1,200 in value vs. $380 in costs

Session Cost Analysis

View Sessions to see:

Average cost per workflow type
Cost distribution across steps
Expensive outliers to investigate

Step 4: Set Up Cost Alerts

Preventing budget overruns before they happen.

Navigate to Alerts

Go to Settings → Alerts in your dashboard.

Create Cost Alert

Click “Create Alert”
Select Cost as the metric
Set your threshold (e.g., $100/day)
Choose time window (e.g., 1 day)
Add filters (optional):
- Environment = “production” (exclude dev costs)
- Feature = “document-analysis” (monitor specific features)

Configure Notifications

Add notification channels:

Email: [email protected]
Slack: #alerts channel

Recommended alert structure:

Daily alert at 80% of budget (warning)
Daily alert at 100% of budget (critical)
Separate alerts for production vs. development

Step 5: Enable Caching for Cost Reduction

Cache repetitive requests to eliminate redundant API costs.

// Enable caching for FAQ responses
await client.chat.completions.create(
  {
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: "Answer FAQ questions" },
      { role: "user", content: "What are your business hours?" }
    ],
  },
  {
    headers: {
      "Helicone-Cache-Enabled": "true",
      "Cache-Control": "max-age=86400", // 24 hours
      "Helicone-Property-Feature": "faq",
    },
  }
);

// Second identical request = $0 cost (cached)
await client.chat.completions.create(
  {
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: "Answer FAQ questions" },
      { role: "user", content: "What are your business hours?" }
    ],
  },
  {
    headers: {
      "Helicone-Cache-Enabled": "true",
      "Cache-Control": "max-age=86400",
      "Helicone-Property-Feature": "faq",
    },
  }
);

Best caching opportunities:

FAQ and support responses
Static content generation
Development/testing environments
Repeated queries with identical inputs

Expected Results

After implementing cost tracking:

Week 1

Total Costs: $487
├── Production: $423 (87%)
│   ├── chat: $245 (58%)
│   ├── document-analysis: $127 (30%)
│   └── search: $51 (12%)
└── Development: $64 (13%)

Top 5 Users by Cost:
1. user-789: $42.50 (premium tier)
2. user-456: $38.20 (premium tier)
3. user-123: $31.80 (free tier)
4. user-234: $28.90 (free tier)
5. user-567: $24.10 (premium tier)

Cache Performance:
- Hit rate: 23%
- Savings: $112

Insights

Premium users cost $35/month average, generate$ 120 value (3.4x ROI)
Free users cost $28/month, unsustainable without limits
Document analysis is most expensive feature at $0.45/session
Caching FAQ responses saved $112 (23% hit rate)

Step 6: Analyze and Optimize

Identify Cost Drivers

Look for:

High-cost users to potentially upgrade or limit
Features with poor cost-to-value ratios
Unexpected development environment costs
Cache opportunities (repeated similar requests)

Take Action

Based on insights:

// Add rate limiting for free tier users
if (userTier === "free" && monthlyCost > 25) {
  throw new Error("Monthly limit reached. Upgrade to premium.");
}

Monitor Impact

Track changes over time:

Did rate limiting reduce free tier costs?
Is model switching maintaining quality?
What’s the new cache hit rate?

Advanced: Query Costs Programmatically

Use the Helicone API to build custom cost dashboards:

const response = await fetch(
  "https://api.helicone.ai/v1/request/query-clickhouse",
  {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${HELICONE_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      filter: {
        request_response_rmt: {
          request_created_at: {
            gte: "2024-01-01T00:00:00Z"
          },
          properties: {
            UserTier: { equals: "premium" }
          }
        }
      },
      limit: 1000,
    }),
  }
);

const data = await response.json();
const totalCost = data.data.reduce(
  (sum, req) => sum + (req.cost_usd || 0), 
  0
);

console.log(`Premium user costs: $${totalCost.toFixed(2)}`);

Best Practices

Start with high-level tracking: Add User ID, Feature, and Environment headers to all requests

Use sessions for complex workflows: Group related requests to see true unit costs

Set graduated alerts: 50%, 80%, 95% of budget to catch issues early

Don’t over-optimize prematurely: Track for 1-2 weeks to understand patterns before making changes

Troubleshooting

Costs showing as $0 or 'not supported'

Helicone calculates costs based on model detection:

Using AI Gateway: 100% accurate costs
Direct integration: Best-effort based on 300+ model pricing

If your model isn’t supported, contact [email protected] to add it.

Properties not appearing in filters

Properties take a few minutes to appear in filters after first use. Ensure:

Header format: Helicone-Property-[Name]
Values are strings (not numbers or booleans)
Requests are successfully logging (check dashboard)

Cost alerts not triggering

Check:

Alert threshold and time window
Minimum request count (low traffic may not trigger)
Filters (too restrictive may exclude all requests)
Notification channels are configured correctly

Next Steps

Cost Tracking Guide

In-depth cost optimization strategies

User Metrics

Track per-user usage and costs

Sessions

Group requests to understand workflow costs

Alerts

Configure cost and error alerts

Tutorials

Use Cases

Track and Optimize LLM Costs

What You’ll Learn

Prerequisites

Step 1: Add Cost Tracking Headers

Step 2: Organize Multi-Step Workflows

Step 3: View Cost Analytics

Step 4: Set Up Cost Alerts

Step 5: Enable Caching for Cost Reduction

Expected Results

Week 1

Insights

Step 6: Analyze and Optimize

Advanced: Query Costs Programmatically

Best Practices

Troubleshooting

Next Steps

Cost Tracking Guide

User Metrics

Sessions

Alerts

Build docs developers (and LLMs) love

Tutorials

Use Cases

​What You’ll Learn

​Prerequisites

​Step 1: Add Cost Tracking Headers

​Step 2: Organize Multi-Step Workflows

​Step 3: View Cost Analytics

​Step 4: Set Up Cost Alerts

​Step 5: Enable Caching for Cost Reduction

​Expected Results

​Week 1

​Insights

​Step 6: Analyze and Optimize

​Advanced: Query Costs Programmatically

​Best Practices

​Troubleshooting

​Next Steps

Cost Tracking Guide

User Metrics

Sessions

Alerts

Build docs developers (and LLMs) love

What You’ll Learn

Prerequisites

Step 1: Add Cost Tracking Headers

Step 2: Organize Multi-Step Workflows

Step 3: View Cost Analytics

Step 4: Set Up Cost Alerts

Step 5: Enable Caching for Cost Reduction

Expected Results

Week 1

Insights

Step 6: Analyze and Optimize

Advanced: Query Costs Programmatically

Best Practices

Troubleshooting

Next Steps