The contextcompany Python package provides manual instrumentation for AI agent observability.
Installation
pip install contextcompany
Package Overview
run() Track end-to-end agent executions
step() Instrument individual LLM calls
tool_call() Track tool and function invocations
submit_feedback() Collect user feedback on runs
Core Functions
run()
Create a new run to track an end-to-end agent execution.
from contextcompany import run
r = run(
run_id: Optional[ str ] = None ,
session_id: Optional[ str ] = None ,
conversational: Optional[ bool ] = None ,
api_key: Optional[ str ] = None ,
tcc_url: Optional[ str ] = None
)
Custom run identifier. Auto-generates a UUID if omitted.
Group multiple runs into a logical session (e.g., a conversation).
Mark this run as part of a multi-turn conversation.
TCC API key. Defaults to TCC_API_KEY environment variable.
Custom ingestion endpoint URL.
Run Methods
Set the user prompt that initiated the run. r.prompt(
user_prompt: str ,
system_prompt: Optional[ str ] = None
) -> Run
r.prompt( 'What is the weather in SF?' )
# or with system prompt
r.prompt(
user_prompt = 'Summarize this article' ,
system_prompt = 'You are a helpful assistant'
)
Set the agent’s final response to the user. r.response(text: str ) -> Run
Attach key-value metadata to the run. Values must be strings. r.metadata(
data: Optional[Dict[ str , str ]] = None ,
** kwargs: str
) -> Run
r.metadata({ 'agent' : 'weather-bot' , 'version' : '1.0' })
# or use kwargs
r.metadata( agent = 'weather-bot' , version = '1.0' )
Set the outcome status code and optional message. r.status(
code: int ,
message: Optional[ str ] = None
) -> Run
0 = success (default)
2 = error
Create a new Step attached to this run. r.step(step_id: Optional[ str ] = None ) -> Step
Create a new ToolCall attached to this run. r.tool_call(
tool_name: Optional[ str ] = None ,
tool_call_id: Optional[ str ] = None
) -> ToolCall
Submit user feedback for this run. r.feedback(
score: Optional[Literal[ 'thumbs_up' , 'thumbs_down' ]] = None ,
text: Optional[ str ] = None
) -> bool
Finalize and send the run. Requires prompt() to have been called. Raises RuntimeError if already ended or ValueError if prompt not set.
End the run with error status (code 2). r.error(status_message: str = '' ) -> None
Read-only property that returns the run’s unique identifier. print (r.run_id) # 'run_abc123'
step()
Create a standalone step to track an individual LLM call.
from contextcompany import step
s = step(
run_id: str ,
step_id: Optional[ str ] = None ,
api_key: Optional[ str ] = None ,
tcc_url: Optional[ str ] = None
)
The parent run identifier.
Custom step identifier. Auto-generates a UUID if omitted.
Step Methods
Set the prompt sent to the LLM. s.prompt(text: str ) -> Step
Set the LLM’s response text. s.response(text: str ) -> Step
Set which model was used. s.model(
requested: Optional[ str ] = None ,
used: Optional[ str ] = None
) -> Step
s.model( requested = 'gpt-4o' , used = 'gpt-4o-2024-08-06' )
# or just one
s.model( requested = 'gpt-4o' )
Set the model’s finish/stop reason. s.finish_reason(reason: str ) -> Step
Example values: 'stop', 'length', 'tool_calls'
Record token usage for this step. s.tokens(
prompt_uncached: Optional[ int ] = None ,
prompt_cached: Optional[ int ] = None ,
completion: Optional[ int ] = None
) -> Step
s.tokens(
prompt_uncached = 120 ,
prompt_cached = 30 ,
completion = 45
)
Set the actual cost of this step in USD. s.cost(real_total: float ) -> Step
Set the tool definitions available during this step. s.tool_definitions(definitions: str ) -> Step
Pass a JSON string of tool schemas.
Set the outcome status code and optional message. s.status(
code: int ,
message: Optional[ str ] = None
) -> Step
Create a new ToolCall attached to this step’s run. s.tool_call(
tool_name: Optional[ str ] = None ,
tool_call_id: Optional[ str ] = None
) -> ToolCall
Finalize and send the step. Requires both prompt() and response() to have been called.
End the step with error status (code 2). s.error(status_message: str = '' ) -> None
Create a standalone tool call to track a tool/function invocation.
from contextcompany import tool_call
tc = tool_call(
run_id: str ,
tool_call_id: Optional[ str ] = None ,
tool_name: Optional[ str ] = None ,
api_key: Optional[ str ] = None ,
tcc_url: Optional[ str ] = None
)
The parent run identifier.
Custom tool call identifier. Auto-generates a UUID if omitted.
The name of the tool being invoked.
Set the tool name. tc.name(tool_name: str ) -> ToolCall
Set the arguments passed to the tool. Dicts are auto-serialized to JSON. tc.args(value: Union[ str , Dict[ str , Any]]) -> ToolCall
tc.args({ 'city' : 'San Francisco' , 'units' : 'metric' })
# or as JSON string
tc.args( '{"city": "San Francisco"}' )
Set the return value from the tool. Dicts are auto-serialized to JSON. tc.result(value: Union[ str , Dict[ str , Any]]) -> ToolCall
Set the outcome status code and optional message. tc.status(
code: int ,
message: Optional[ str ] = None
) -> ToolCall
Finalize and send the tool call. Requires name() to have been called.
End the tool call with error status (code 2). tc.error(status_message: str = '' ) -> None
submit_feedback()
Submit user feedback for a run.
from contextcompany import submit_feedback
success = submit_feedback(
run_id: str ,
score: Optional[Literal[ 'thumbs_up' , 'thumbs_down' ]] = None ,
text: Optional[ str ] = None ,
api_key: Optional[ str ] = None ,
tcc_url: Optional[ str ] = None
) -> bool
The run identifier to attach feedback to.
score
'thumbs_up' | 'thumbs_down'
Binary feedback score.
Optional feedback text (max 2000 characters).
At least one of score or text must be provided.
from contextcompany import submit_feedback
# Score only
submit_feedback( run_id = 'run_123' , score = 'thumbs_up' )
# Text only
submit_feedback( run_id = 'run_123' , text = 'Great response!' )
# Both
submit_feedback(
run_id = 'run_123' ,
score = 'thumbs_up' ,
text = 'Exactly what I needed'
)
Configuration
Environment Variables
Your Observatory API key. Keys starting with dev_ route to development environment.
Override the default ingestion endpoint URL.
Override the default feedback endpoint URL.
Set to 'true' to enable debug logging.
Helper Functions
from contextcompany import get_api_key, get_url
# Get API key from environment
api_key = get_api_key(api_key: Optional[ str ] = None ) -> str
# Get endpoint URL with dev/prod selection
url = get_url(
prod_url: str ,
dev_url: str ,
tcc_url: Optional[ str ] = None ,
api_key: Optional[ str ] = None
) -> str
Complete Examples
Basic Run
from contextcompany import run
r = run( session_id = 'session_123' )
r.prompt( 'What is the weather in San Francisco?' )
r.metadata( agent = 'weather-bot' , version = '1.0' )
# ... agent logic
r.response( 'It is 72°F and sunny.' )
r.end()
Run with Steps
from contextcompany import run
import json
r = run( session_id = 'session_456' , conversational = True )
r.prompt( 'Analyze this data' )
# Step 1: First LLM call
s1 = r.step()
s1.prompt(json.dumps(messages))
s1.response(assistant_response)
s1.model( requested = 'gpt-4o' , used = 'gpt-4o-2024-08-06' )
s1.tokens( prompt_uncached = 120 , prompt_cached = 30 , completion = 45 )
s1.cost( 0.0042 )
s1.end()
# Step 2: Follow-up call
s2 = r.step()
s2.prompt(json.dumps(followup_messages))
s2.response(followup_response)
s2.model( requested = 'gpt-4o' , used = 'gpt-4o-2024-08-06' )
s2.tokens( prompt_uncached = 80 , completion = 30 )
s2.cost( 0.0028 )
s2.end()
r.response( 'Analysis complete.' )
r.end()
from contextcompany import run
r = run()
r.prompt( 'Get the weather for San Francisco' )
# Tool call
tc = r.tool_call( 'get_weather' )
tc.args({ 'city' : 'San Francisco' , 'units' : 'imperial' })
# Execute tool
weather_data = { 'temp' : 72 , 'condition' : 'sunny' }
tc.result(weather_data)
tc.end()
r.response( 'It is 72°F and sunny in San Francisco.' )
r.end()
Error Handling
from contextcompany import run
r = run()
try :
r.prompt( 'Process this data' )
# ... agent logic that might fail
result = process_data()
r.response(result)
r.end()
except Exception as e:
r.error( f 'Processing failed: { str (e) } ' )
Feedback Collection
from contextcompany import run, submit_feedback
# Execute run
r = run()
r.prompt( 'Help me with this task' )
r.response( 'Here is how to do it...' )
r.end()
# Later, collect feedback
success = submit_feedback(
run_id = r.run_id,
score = 'thumbs_up' ,
text = 'Very helpful!'
)
if success:
print ( 'Feedback submitted' )
Standalone Step (Advanced)
from contextcompany import step
import json
# Create a step independently
s = step( run_id = 'run_existing_123' )
s.prompt(json.dumps(messages))
s.response(response_text)
s.model( requested = 'claude-3-5-sonnet' )
s.tokens( prompt_uncached = 150 , completion = 60 )
s.finish_reason( 'stop' )
s.end()
LiteLLM Integration
Observatory provides a callback for LiteLLM that automatically exports each LLM call as an OpenTelemetry span.
Installation
pip install contextcompany litellm
Usage
from contextcompany.litellm import TCCCallback
from contextcompany import run
import litellm
# Configure LiteLLM with TCC callback
litellm.callbacks = [TCCCallback()]
# Create a run
r = run()
r.prompt( 'What is the capital of France?' )
# Make LiteLLM call with run_id in metadata
response = litellm.completion(
model = 'gpt-4o' ,
messages = [{ 'role' : 'user' , 'content' : 'What is the capital of France?' }],
metadata = { 'tcc.runId' : r.run_id} # Link to run
)
r.response(response.choices[ 0 ].message.content)
r.end()
TCCCallback
from contextcompany.litellm import TCCCallback
callback = TCCCallback(
api_key: Optional[ str ] = None ,
endpoint: Optional[ str ] = None ,
service_name: str = 'litellm'
)
TCC API key. Defaults to TCC_API_KEY environment variable.
Custom OTLP endpoint URL. Auto-detects based on API key.
OpenTelemetry service name (default: 'litellm').
Linking LLM Calls to Runs
Pass the run ID in the metadata parameter:
# Option 1: tcc.runId
response = litellm.completion(
model = 'gpt-4' ,
messages = [ ... ],
metadata = { 'tcc.runId' : r.run_id}
)
# Option 2: tcc.run_id (snake_case)
response = litellm.completion(
model = 'gpt-4' ,
messages = [ ... ],
metadata = { 'tcc.run_id' : r.run_id}
)
Best Practices
Session Tracking
Group related runs into sessions for multi-turn conversations:
from contextcompany import run
import uuid
session_id = str (uuid.uuid4())
# Turn 1
r1 = run( session_id = session_id, conversational = True )
r1.prompt( 'Hello' )
r1.response( 'Hi! How can I help?' )
r1.end()
# Turn 2 (same session)
r2 = run( session_id = session_id, conversational = True )
r2.prompt( 'What is the weather?' )
r2.response( 'It is sunny.' )
r2.end()
Use metadata to add searchable context:
r.metadata(
user_id = 'user_123' ,
agent_version = '2.1.0' ,
environment = 'production' ,
feature_flag = 'new-model'
)
Token Tracking
Always report token usage for cost tracking:
s.tokens(
prompt_uncached = response.usage.prompt_tokens,
completion = response.usage.completion_tokens
)
Error Reporting
Use .error() instead of .end() when failures occur:
try :
# agent logic
r.end()
except ValidationError as e:
r.error( f 'Validation failed: { e } ' )
except Exception as e:
r.error( f 'Unexpected error: { e } ' )
Next Steps
TypeScript SDKs TypeScript SDK reference documentation
Quickstart Get started with Observatory in 5 minutes
Python API Complete Python API reference
Configuration Configuration guide