LiteLLM Integration

Overview

The Observatory Python SDK provides automatic instrumentation for LiteLLM via the TCCCallback class. This callback exports each LLM call as an OpenTelemetry span, allowing you to track LiteLLM completions without manual instrumentation.

Installation

Install the SDK with LiteLLM support:

pip install contextcompany[litellm]

This installs:

litellm
opentelemetry-sdk
opentelemetry-exporter-otlp-proto-http

Basic Setup

from contextcompany.litellm import TCCCallback
import litellm

# Set up the callback
litellm.callbacks = [TCCCallback()]

# Make LLM calls as usual
response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

Linking to Runs

To associate LiteLLM calls with Observatory runs, pass the run ID in metadata:

from contextcompany import run
from contextcompany.litellm import TCCCallback
import litellm

litellm.callbacks = [TCCCallback()]

# Create a run
r = run()

# Link LiteLLM calls to this run
response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather?"}],
    metadata={"tcc.runId": r.run_id}
)

r.prompt(user_prompt="What's the weather?")
r.response(response.choices[0].message.content)
r.end()

Class Reference

TCCCallback

from contextcompany.litellm import TCCCallback

TCCCallback(
    api_key: Optional[str] = None,
    endpoint: Optional[str] = None,
    service_name: str = "litellm"
)

Parameters

api_key

str

Observatory API key. If not provided, uses the TCC_API_KEY environment variable.

endpoint

str

Custom OpenTelemetry endpoint URL. If not provided, uses the default Observatory OTEL endpoint.

service_name

str

default:"litellm"

Service name for OpenTelemetry traces.

What Gets Tracked

The callback automatically captures:

Model Information: Requested model and actual model used
Messages: Input messages and output content
Token Usage: Input tokens, output tokens, and cached tokens
Tool Calls: Function/tool invocations and their arguments
Finish Reasons: Why the LLM stopped generating (stop, length, tool_calls, etc.)
Run Association: Links to Observatory runs via metadata.tcc.runId
Errors: Failed LLM calls with error messages

Metadata Keys

Use these metadata keys to control tracking:

tcc.runId

str

Associate this LLM call with an Observatory run. Use r.run_id from your run object.

tcc.run_id

str

Alternative key name for run association (snake_case variant).

Usage Examples

Basic Chat Completion

from contextcompany.litellm import TCCCallback
import litellm

litellm.callbacks = [TCCCallback()]

response = litellm.completion(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)

print(response.choices[0].message.content)

With Run Tracking

from contextcompany import run
from contextcompany.litellm import TCCCallback
import litellm

litellm.callbacks = [TCCCallback()]

r = run()

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Tell me a joke"}],
    metadata={"tcc.runId": r.run_id}
)

r.prompt(user_prompt="Tell me a joke")
r.response(response.choices[0].message.content)
r.end()

Function Calling

from contextcompany import run
from contextcompany.litellm import TCCCallback
import litellm
import json

litellm.callbacks = [TCCCallback()]

r = run()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }
]

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather in NYC?"}],
    tools=tools,
    metadata={"tcc.runId": r.run_id}
)

tool_call = response.choices[0].message.tool_calls[0]
print(f"Calling {tool_call.function.name} with {tool_call.function.arguments}")

r.prompt(user_prompt="What's the weather in NYC?")
r.response(f"Need to call {tool_call.function.name}")
r.end()

Streaming Responses

from contextcompany import run
from contextcompany.litellm import TCCCallback
import litellm

litellm.callbacks = [TCCCallback()]

r = run()

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a short story"}],
    stream=True,
    metadata={"tcc.runId": r.run_id}
)

full_response = ""
for chunk in response:
    content = chunk.choices[0].delta.content or ""
    full_response += content
    print(content, end="", flush=True)

print()  # New line

r.prompt(user_prompt="Write a short story")
r.response(full_response)
r.end()

Multiple Models

from contextcompany import run
from contextcompany.litellm import TCCCallback
import litellm

litellm.callbacks = [TCCCallback()]

r = run()

# First call with GPT-4
response1 = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain AI"}],
    metadata={"tcc.runId": r.run_id}
)

# Second call with Claude
response2 = litellm.completion(
    model="claude-3-opus-20240229",
    messages=[{"role": "user", "content": "Summarize the above"}],
    metadata={"tcc.runId": r.run_id}
)

r.prompt(user_prompt="Explain AI and summarize")
r.response(response2.choices[0].message.content)
r.end()

Custom Endpoint

from contextcompany.litellm import TCCCallback
import litellm

# Use a custom Observatory instance
callback = TCCCallback(
    api_key="your_api_key",
    endpoint="https://custom.example.com/otel-steps",
    service_name="my-agent"
)

litellm.callbacks = [callback]

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

Best Practices

Set up callback once: Initialize TCCCallback at application startup and reuse it for all LiteLLM calls.
Always pass run ID: Include metadata={"tcc.runId": r.run_id} to link LLM calls to runs.
Handle streaming: For streaming responses, collect the full response before calling r.response().
Track function calls separately: Use tool_call() to track tool invocations alongside LiteLLM’s automatic step tracking.
Use environment variables: Set TCC_API_KEY instead of hardcoding credentials.

Environment Variables

TCC_API_KEY

string

required

Your Observatory API key

TCC_URL

string

Custom Observatory endpoint URL (for the OTEL exporter)

Comparison: Manual vs Auto-instrumentation

Manual Instrumentation

from contextcompany import run
import openai

r = run()
s = r.step()

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

s.prompt("Hello!")
s.response(response.choices[0].message.content)
s.model(requested="gpt-4", used=response.model)
s.tokens(
    prompt_uncached=response.usage.prompt_tokens,
    completion=response.usage.completion_tokens
)
s.end()

r.prompt(user_prompt="Hello!")
r.response(response.choices[0].message.content)
r.end()

Auto-instrumentation with LiteLLM

from contextcompany import run
from contextcompany.litellm import TCCCallback
import litellm

litellm.callbacks = [TCCCallback()]

r = run()

# Steps are automatically created and tracked
response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    metadata={"tcc.runId": r.run_id}
)

r.prompt(user_prompt="Hello!")
r.response(response.choices[0].message.content)
r.end()

Auto-instrumentation reduces boilerplate and ensures consistent tracking across all LiteLLM calls.

Limitations

LiteLLM only: This callback only works with LiteLLM. For other LLM libraries, use manual instrumentation with step().
Run association required: You must pass metadata.tcc.runId to link LLM calls to runs.
OTEL overhead: OpenTelemetry adds some performance overhead compared to direct HTTP calls.

Troubleshooting

Spans not appearing in Observatory

Verify your API key is set: echo $TCC_API_KEY
Check that metadata includes tcc.runId: metadata={"tcc.runId": r.run_id}
Ensure the callback is registered: litellm.callbacks = [TCCCallback()]

Missing token counts

Some models don’t return token usage. LiteLLM will estimate tokens when possible.

Import errors

Make sure you installed with the litellm extra:

pip install contextcompany[litellm]

TypeScript

Python

Overview

Installation

Basic Setup

Linking to Runs

Class Reference

TCCCallback

Parameters

What Gets Tracked

Metadata Keys

Usage Examples

Basic Chat Completion

With Run Tracking

Function Calling

Streaming Responses

Multiple Models

Custom Endpoint

Best Practices

Environment Variables

Comparison: Manual vs Auto-instrumentation

Manual Instrumentation

Auto-instrumentation with LiteLLM

Limitations

Troubleshooting

Spans not appearing in Observatory

Missing token counts

Import errors

See Also

Build docs developers (and LLMs) love

TypeScript

Python

​Overview

​Installation

​Basic Setup

​Linking to Runs

​Class Reference

​TCCCallback

​Parameters

​What Gets Tracked

​Metadata Keys

​Usage Examples

​Basic Chat Completion

​With Run Tracking

​Function Calling

​Streaming Responses

​Multiple Models

​Custom Endpoint

​Best Practices

​Environment Variables

​Comparison: Manual vs Auto-instrumentation

​Manual Instrumentation

​Auto-instrumentation with LiteLLM

​Limitations

​Troubleshooting

​Spans not appearing in Observatory

​Missing token counts

​Import errors

​See Also

Build docs developers (and LLMs) love

Overview

Installation

Basic Setup

Linking to Runs

Class Reference

TCCCallback

Parameters

What Gets Tracked

Metadata Keys

Usage Examples

Basic Chat Completion

With Run Tracking

Function Calling

Streaming Responses

Multiple Models

Custom Endpoint

Best Practices

Environment Variables

Comparison: Manual vs Auto-instrumentation

Manual Instrumentation

Auto-instrumentation with LiteLLM

Limitations

Troubleshooting

Spans not appearing in Observatory

Missing token counts

Import errors

See Also