Overview
The Observatory Python SDK provides automatic instrumentation for LiteLLM via the TCCCallback class. This callback exports each LLM call as an OpenTelemetry span, allowing you to track LiteLLM completions without manual instrumentation.
Installation
Install the SDK with LiteLLM support:
pip install contextcompany[litellm]
This installs:
litellm
opentelemetry-sdk
opentelemetry-exporter-otlp-proto-http
Basic Setup
Register the callback with LiteLLM:
from contextcompany.litellm import TCCCallback
import litellm
# Set up the callback
litellm.callbacks = [TCCCallback()]
# Make LLM calls as usual
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)
Linking to Runs
To associate LiteLLM calls with Observatory runs, pass the run ID in metadata:
from contextcompany import run
from contextcompany.litellm import TCCCallback
import litellm
litellm.callbacks = [TCCCallback()]
# Create a run
r = run()
# Link LiteLLM calls to this run
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "What's the weather?"}],
metadata={"tcc.runId": r.run_id}
)
r.prompt(user_prompt="What's the weather?")
r.response(response.choices[0].message.content)
r.end()
Class Reference
TCCCallback
from contextcompany.litellm import TCCCallback
TCCCallback(
api_key: Optional[str] = None,
endpoint: Optional[str] = None,
service_name: str = "litellm"
)
Parameters
Observatory API key. If not provided, uses the TCC_API_KEY environment variable.
Custom OpenTelemetry endpoint URL. If not provided, uses the default Observatory OTEL endpoint.
Service name for OpenTelemetry traces.
What Gets Tracked
The callback automatically captures:
- Model Information: Requested model and actual model used
- Messages: Input messages and output content
- Token Usage: Input tokens, output tokens, and cached tokens
- Tool Calls: Function/tool invocations and their arguments
- Finish Reasons: Why the LLM stopped generating (stop, length, tool_calls, etc.)
- Run Association: Links to Observatory runs via
metadata.tcc.runId
- Errors: Failed LLM calls with error messages
Use these metadata keys to control tracking:
Associate this LLM call with an Observatory run. Use r.run_id from your run object.
Alternative key name for run association (snake_case variant).
Usage Examples
Basic Chat Completion
from contextcompany.litellm import TCCCallback
import litellm
litellm.callbacks = [TCCCallback()]
response = litellm.completion(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
)
print(response.choices[0].message.content)
With Run Tracking
from contextcompany import run
from contextcompany.litellm import TCCCallback
import litellm
litellm.callbacks = [TCCCallback()]
r = run()
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "Tell me a joke"}],
metadata={"tcc.runId": r.run_id}
)
r.prompt(user_prompt="Tell me a joke")
r.response(response.choices[0].message.content)
r.end()
Function Calling
from contextcompany import run
from contextcompany.litellm import TCCCallback
import litellm
import json
litellm.callbacks = [TCCCallback()]
r = run()
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
]
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "What's the weather in NYC?"}],
tools=tools,
metadata={"tcc.runId": r.run_id}
)
tool_call = response.choices[0].message.tool_calls[0]
print(f"Calling {tool_call.function.name} with {tool_call.function.arguments}")
r.prompt(user_prompt="What's the weather in NYC?")
r.response(f"Need to call {tool_call.function.name}")
r.end()
Streaming Responses
from contextcompany import run
from contextcompany.litellm import TCCCallback
import litellm
litellm.callbacks = [TCCCallback()]
r = run()
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "Write a short story"}],
stream=True,
metadata={"tcc.runId": r.run_id}
)
full_response = ""
for chunk in response:
content = chunk.choices[0].delta.content or ""
full_response += content
print(content, end="", flush=True)
print() # New line
r.prompt(user_prompt="Write a short story")
r.response(full_response)
r.end()
Multiple Models
from contextcompany import run
from contextcompany.litellm import TCCCallback
import litellm
litellm.callbacks = [TCCCallback()]
r = run()
# First call with GPT-4
response1 = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "Explain AI"}],
metadata={"tcc.runId": r.run_id}
)
# Second call with Claude
response2 = litellm.completion(
model="claude-3-opus-20240229",
messages=[{"role": "user", "content": "Summarize the above"}],
metadata={"tcc.runId": r.run_id}
)
r.prompt(user_prompt="Explain AI and summarize")
r.response(response2.choices[0].message.content)
r.end()
Custom Endpoint
from contextcompany.litellm import TCCCallback
import litellm
# Use a custom Observatory instance
callback = TCCCallback(
api_key="your_api_key",
endpoint="https://custom.example.com/otel-steps",
service_name="my-agent"
)
litellm.callbacks = [callback]
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)
Best Practices
-
Set up callback once: Initialize
TCCCallback at application startup and reuse it for all LiteLLM calls.
-
Always pass run ID: Include
metadata={"tcc.runId": r.run_id} to link LLM calls to runs.
-
Handle streaming: For streaming responses, collect the full response before calling
r.response().
-
Track function calls separately: Use
tool_call() to track tool invocations alongside LiteLLM’s automatic step tracking.
-
Use environment variables: Set
TCC_API_KEY instead of hardcoding credentials.
Environment Variables
Custom Observatory endpoint URL (for the OTEL exporter)
Comparison: Manual vs Auto-instrumentation
Manual Instrumentation
from contextcompany import run
import openai
r = run()
s = r.step()
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)
s.prompt("Hello!")
s.response(response.choices[0].message.content)
s.model(requested="gpt-4", used=response.model)
s.tokens(
prompt_uncached=response.usage.prompt_tokens,
completion=response.usage.completion_tokens
)
s.end()
r.prompt(user_prompt="Hello!")
r.response(response.choices[0].message.content)
r.end()
Auto-instrumentation with LiteLLM
from contextcompany import run
from contextcompany.litellm import TCCCallback
import litellm
litellm.callbacks = [TCCCallback()]
r = run()
# Steps are automatically created and tracked
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}],
metadata={"tcc.runId": r.run_id}
)
r.prompt(user_prompt="Hello!")
r.response(response.choices[0].message.content)
r.end()
Auto-instrumentation reduces boilerplate and ensures consistent tracking across all LiteLLM calls.
Limitations
- LiteLLM only: This callback only works with LiteLLM. For other LLM libraries, use manual instrumentation with
step().
- Run association required: You must pass
metadata.tcc.runId to link LLM calls to runs.
- OTEL overhead: OpenTelemetry adds some performance overhead compared to direct HTTP calls.
Troubleshooting
Spans not appearing in Observatory
- Verify your API key is set:
echo $TCC_API_KEY
- Check that metadata includes
tcc.runId: metadata={"tcc.runId": r.run_id}
- Ensure the callback is registered:
litellm.callbacks = [TCCCallback()]
Missing token counts
Some models don’t return token usage. LiteLLM will estimate tokens when possible.
Import errors
Make sure you installed with the litellm extra:
pip install contextcompany[litellm]
See Also