Overview
The AnthropicClient provides a unified interface for interacting with Anthropic’s Claude models, including Claude 3.7 Sonnet, Claude 4.5, and Haiku variants.
Installation
pip install graphiti-core[anthropic]
Basic Usage
from graphiti_core.llm_client import AnthropicClient
from graphiti_core.llm_client.config import LLMConfig
from graphiti_core.prompts.models import Message
from pydantic import BaseModel
# Initialize client
client = AnthropicClient(
config=LLMConfig(
api_key="sk-ant-...", # Or set ANTHROPIC_API_KEY env var
model="claude-haiku-4-5-latest",
temperature=1.0,
max_tokens=16384
)
)
# Define response structure
class Analysis(BaseModel):
sentiment: str
confidence: float
key_themes: list[str]
# Generate structured response
messages = [
Message(role="system", content="Analyze the sentiment of the text."),
Message(role="user", content="I absolutely love this product!")
]
response = await client.generate_response(
messages=messages,
response_model=Analysis
)
Constructor
config
LLMConfig | None
default:"None"
Configuration object. If None, creates default config with:
api_key from ANTHROPIC_API_KEY environment variable
model set to "claude-haiku-4-5-latest"
max_tokens from parameter (default: 16384)
Enable response caching (stored in ./llm_cache)
client
AsyncAnthropic | None
default:"None"
Optional pre-configured AsyncAnthropic client instance. If not provided, creates one from config.
Maximum output tokens. Defaults to 16384, but see model-specific limits below.
Supported Models
The client supports all Claude models with model-specific max token limits:
Claude 4.5 (64K output)
claude-sonnet-4-5-latest
claude-sonnet-4-5-20250929
claude-haiku-4-5-latest
Claude 3.7 Sonnet (64K output)
claude-3-7-sonnet-latest
claude-3-7-sonnet-20250219
Claude 3.7 supports up to 128K output tokens with the anthropic-beta: output-128k-2025-02-19 header, but this is not currently implemented.
Claude 3.5 (8K output)
claude-3-5-haiku-latest
claude-3-5-haiku-20241022
claude-3-5-sonnet-latest
claude-3-5-sonnet-20241022
claude-3-5-sonnet-20240620
Claude 3 (4K output)
claude-3-opus-latest
claude-3-opus-20240229
claude-3-sonnet-20240229
claude-3-haiku-20240307
Claude 2 (4K output)
Max Tokens Resolution
The client uses intelligent max token resolution with the following precedence:
- Explicit parameter to
generate_response()
- Instance max_tokens set during initialization
- Model-specific maximum from the mapping above
- Default fallback: 8192 tokens
# Example: Using different max_tokens strategies
# 1. Model-specific (automatic): 65536 for claude-4-5
client = AnthropicClient(
config=LLMConfig(model="claude-sonnet-4-5-latest")
)
# 2. Instance-level: 32K for all requests
client = AnthropicClient(
config=LLMConfig(model="claude-haiku-4-5-latest"),
max_tokens=32000
)
# 3. Per-request: 8K for this specific request
response = await client.generate_response(
messages=messages,
max_tokens=8192
)
The client uses Anthropic’s tool-calling API for structured outputs:
# With response_model
class PersonInfo(BaseModel):
"""Extract person information"""
name: str
age: int
location: str
response = await client.generate_response(
messages=messages,
response_model=PersonInfo
)
# Returns: {'name': '...', 'age': 30, 'location': '...'}
# Without response_model (generic JSON)
response = await client.generate_response(
messages=messages
)
# Returns: any JSON object
The tool definition is created automatically:
# For PersonInfo model:
tool = {
'name': 'PersonInfo',
'description': 'Extract person information',
'input_schema': {
'type': 'object',
'properties': {
'name': {'type': 'string'},
'age': {'type': 'integer'},
'location': {'type': 'string'}
},
'required': ['name', 'age', 'location']
}
}
Error Handling
The client implements comprehensive error handling:
Rate Limits
from graphiti_core.llm_client.errors import RateLimitError
try:
response = await client.generate_response(messages=messages)
except RateLimitError as e:
print(f"Rate limited: {e}")
# No automatic retry - implement backoff in your code
Content Policy Violations
from graphiti_core.llm_client.errors import RefusalError
try:
response = await client.generate_response(messages=messages)
except RefusalError as e:
print(f"Content policy violation: {e}")
# No retry - content was rejected
Automatic Retries
The client automatically retries (max 2 times) for:
- Validation errors (with error context for self-correction)
- Transient API errors
- JSON parsing failures
# Retry flow for validation errors:
# 1. First attempt fails validation
# 2. Error context appended to messages
# 3. Retry with guidance
# 4. Up to 2 total retries before raising exception
Token Usage Tracking
The client tracks token usage from the API response:
client = AnthropicClient()
response = await client.generate_response(
messages=messages,
prompt_name="entity_extraction"
)
# Check usage
usage = client.token_tracker.get_usage()
print(f"Input tokens: {usage['input_tokens']}")
print(f"Output tokens: {usage['output_tokens']}")
print(f"Total tokens: {usage['total_tokens']}")
# Usage by prompt name
prompt_usage = client.token_tracker.get_usage_by_prompt("entity_extraction")
If tool use fails, the client attempts to extract JSON from text:
# Extracts JSON from responses like:
# "Here's the information: {\"name\": \"John\", \"age\": 30}"
# Looks for first '{' to last '}'
json_start = text.find('{')
json_end = text.rfind('}') + 1
json_str = text[json_start:json_end]
return json.loads(json_str)
Example: Multi-turn Conversation
from graphiti_core.llm_client import AnthropicClient
from graphiti_core.prompts.models import Message
client = AnthropicClient(
config=LLMConfig(model="claude-sonnet-4-5-latest")
)
messages = [
Message(role="system", content="You are a helpful assistant."),
Message(role="user", content="What's the capital of France?"),
]
# First response
response1 = await client.generate_response(messages=messages)
print(response1) # {'content': 'The capital of France is Paris.'}
# Continue conversation
messages.append(Message(role="user", content="What's its population?"))
response2 = await client.generate_response(messages=messages)
- Use Haiku for simple tasks: Claude Haiku is 3x cheaper and faster than Sonnet
- Set appropriate max_tokens: Don’t request 64K if you only need 1K
- Enable caching for repeated queries with same context
- Use model_size parameter: Let Graphiti choose the right model
# Automatic model selection
response = await client.generate_response(
messages=messages,
model_size=ModelSize.small # Uses small_model from config
)