Skip to main content

Overview

The AnthropicClient provides a unified interface for interacting with Anthropic’s Claude models, including Claude 3.7 Sonnet, Claude 4.5, and Haiku variants.

Installation

pip install graphiti-core[anthropic]

Basic Usage

from graphiti_core.llm_client import AnthropicClient
from graphiti_core.llm_client.config import LLMConfig
from graphiti_core.prompts.models import Message
from pydantic import BaseModel

# Initialize client
client = AnthropicClient(
    config=LLMConfig(
        api_key="sk-ant-...",  # Or set ANTHROPIC_API_KEY env var
        model="claude-haiku-4-5-latest",
        temperature=1.0,
        max_tokens=16384
    )
)

# Define response structure
class Analysis(BaseModel):
    sentiment: str
    confidence: float
    key_themes: list[str]

# Generate structured response
messages = [
    Message(role="system", content="Analyze the sentiment of the text."),
    Message(role="user", content="I absolutely love this product!")
]

response = await client.generate_response(
    messages=messages,
    response_model=Analysis
)

Constructor

config
LLMConfig | None
default:"None"
Configuration object. If None, creates default config with:
  • api_key from ANTHROPIC_API_KEY environment variable
  • model set to "claude-haiku-4-5-latest"
  • max_tokens from parameter (default: 16384)
cache
bool
default:"False"
Enable response caching (stored in ./llm_cache)
client
AsyncAnthropic | None
default:"None"
Optional pre-configured AsyncAnthropic client instance. If not provided, creates one from config.
max_tokens
int
default:"16384"
Maximum output tokens. Defaults to 16384, but see model-specific limits below.

Supported Models

The client supports all Claude models with model-specific max token limits:

Claude 4.5 (64K output)

  • claude-sonnet-4-5-latest
  • claude-sonnet-4-5-20250929
  • claude-haiku-4-5-latest

Claude 3.7 Sonnet (64K output)

  • claude-3-7-sonnet-latest
  • claude-3-7-sonnet-20250219
Claude 3.7 supports up to 128K output tokens with the anthropic-beta: output-128k-2025-02-19 header, but this is not currently implemented.

Claude 3.5 (8K output)

  • claude-3-5-haiku-latest
  • claude-3-5-haiku-20241022
  • claude-3-5-sonnet-latest
  • claude-3-5-sonnet-20241022
  • claude-3-5-sonnet-20240620

Claude 3 (4K output)

  • claude-3-opus-latest
  • claude-3-opus-20240229
  • claude-3-sonnet-20240229
  • claude-3-haiku-20240307

Claude 2 (4K output)

  • claude-2.1
  • claude-2.0

Max Tokens Resolution

The client uses intelligent max token resolution with the following precedence:
  1. Explicit parameter to generate_response()
  2. Instance max_tokens set during initialization
  3. Model-specific maximum from the mapping above
  4. Default fallback: 8192 tokens
# Example: Using different max_tokens strategies

# 1. Model-specific (automatic): 65536 for claude-4-5
client = AnthropicClient(
    config=LLMConfig(model="claude-sonnet-4-5-latest")
)

# 2. Instance-level: 32K for all requests
client = AnthropicClient(
    config=LLMConfig(model="claude-haiku-4-5-latest"),
    max_tokens=32000
)

# 3. Per-request: 8K for this specific request
response = await client.generate_response(
    messages=messages,
    max_tokens=8192
)

Structured Output via Tools

The client uses Anthropic’s tool-calling API for structured outputs:
# With response_model
class PersonInfo(BaseModel):
    """Extract person information"""
    name: str
    age: int
    location: str

response = await client.generate_response(
    messages=messages,
    response_model=PersonInfo
)
# Returns: {'name': '...', 'age': 30, 'location': '...'}

# Without response_model (generic JSON)
response = await client.generate_response(
    messages=messages
)
# Returns: any JSON object
The tool definition is created automatically:
# For PersonInfo model:
tool = {
    'name': 'PersonInfo',
    'description': 'Extract person information',
    'input_schema': {
        'type': 'object',
        'properties': {
            'name': {'type': 'string'},
            'age': {'type': 'integer'},
            'location': {'type': 'string'}
        },
        'required': ['name', 'age', 'location']
    }
}

Error Handling

The client implements comprehensive error handling:

Rate Limits

from graphiti_core.llm_client.errors import RateLimitError

try:
    response = await client.generate_response(messages=messages)
except RateLimitError as e:
    print(f"Rate limited: {e}")
    # No automatic retry - implement backoff in your code

Content Policy Violations

from graphiti_core.llm_client.errors import RefusalError

try:
    response = await client.generate_response(messages=messages)
except RefusalError as e:
    print(f"Content policy violation: {e}")
    # No retry - content was rejected

Automatic Retries

The client automatically retries (max 2 times) for:
  • Validation errors (with error context for self-correction)
  • Transient API errors
  • JSON parsing failures
# Retry flow for validation errors:
# 1. First attempt fails validation
# 2. Error context appended to messages
# 3. Retry with guidance
# 4. Up to 2 total retries before raising exception

Token Usage Tracking

The client tracks token usage from the API response:
client = AnthropicClient()

response = await client.generate_response(
    messages=messages,
    prompt_name="entity_extraction"
)

# Check usage
usage = client.token_tracker.get_usage()
print(f"Input tokens: {usage['input_tokens']}")
print(f"Output tokens: {usage['output_tokens']}")
print(f"Total tokens: {usage['total_tokens']}")

# Usage by prompt name
prompt_usage = client.token_tracker.get_usage_by_prompt("entity_extraction")

JSON Fallback Extraction

If tool use fails, the client attempts to extract JSON from text:
# Extracts JSON from responses like:
# "Here's the information: {\"name\": \"John\", \"age\": 30}"

# Looks for first '{' to last '}'
json_start = text.find('{')
json_end = text.rfind('}') + 1
json_str = text[json_start:json_end]
return json.loads(json_str)

Example: Multi-turn Conversation

from graphiti_core.llm_client import AnthropicClient
from graphiti_core.prompts.models import Message

client = AnthropicClient(
    config=LLMConfig(model="claude-sonnet-4-5-latest")
)

messages = [
    Message(role="system", content="You are a helpful assistant."),
    Message(role="user", content="What's the capital of France?"),
]

# First response
response1 = await client.generate_response(messages=messages)
print(response1)  # {'content': 'The capital of France is Paris.'}

# Continue conversation
messages.append(Message(role="user", content="What's its population?"))
response2 = await client.generate_response(messages=messages)

Performance Tips

  1. Use Haiku for simple tasks: Claude Haiku is 3x cheaper and faster than Sonnet
  2. Set appropriate max_tokens: Don’t request 64K if you only need 1K
  3. Enable caching for repeated queries with same context
  4. Use model_size parameter: Let Graphiti choose the right model
# Automatic model selection
response = await client.generate_response(
    messages=messages,
    model_size=ModelSize.small  # Uses small_model from config
)

Build docs developers (and LLMs) love