Skip to main content

Overview

OpenRouter provides access to many LLM providers through a single API. LiteLLM seamlessly integrates with OpenRouter, supporting advanced features like provider routing, cost tracking, and prompt caching.

Quick Start

1

Install LiteLLM

pip install litellm
2

Set API Key

export OPENROUTER_API_KEY="sk-or-..."
3

Make Your First Call

from litellm import completion

response = completion(
    model="openrouter/anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
from litellm import completion

# Claude 3.5 Sonnet
response = completion(
    model="openrouter/anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Explain AI"}]
)

# Claude 3 Opus
response = completion(
    model="openrouter/anthropic/claude-3-opus",
    messages=[{"role": "user", "content": "Complex task"}]
)

Authentication

export OPENROUTER_API_KEY="sk-or-..."
from litellm import completion

response = completion(
    model="openrouter/anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Hello!"}]
)

Streaming

from litellm import completion

response = completion(
    model="openrouter/anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Reasoning Models

OpenRouter supports reasoning models with thinking/reasoning content.
from litellm import completion

response = completion(
    model="openrouter/openai/o1",
    messages=[{"role": "user", "content": "Solve this complex problem..."}],
    reasoning_effort="high"  # For supported models
)

if response.choices[0].message.reasoning_content:
    print("Reasoning:", response.choices[0].message.reasoning_content)
print("Answer:", response.choices[0].message.content)

Provider Routing

Control which providers OpenRouter uses.
from litellm import completion

response = completion(
    model="openrouter/anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Hello!"}],
    # Specify allowed providers
    models=["anthropic/claude-3.5-sonnet"],
    # Or use routing preferences
    route="fallback"  # or "least-busy"
)

Cost Tracking

LiteLLM automatically extracts cost information from OpenRouter.
from litellm import completion

response = completion(
    model="openrouter/anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Cost is automatically tracked in usage
if hasattr(response, '_hidden_params'):
    cost = response._hidden_params.get('additional_headers', {}).get(
        'llm_provider-x-litellm-response-cost'
    )
    if cost:
        print(f"Request cost: ${cost}")

Prompt Caching

OpenRouter supports prompt caching for Claude and Gemini models.
from litellm import completion

# Cache system message
response = completion(
    model="openrouter/anthropic/claude-3.5-sonnet",
    messages=[
        {
            "role": "system",
            "content": "Long system prompt...",
            "cache_control": {"type": "ephemeral"}
        },
        {"role": "user", "content": "Question?"}
    ]
)
Cache control is automatically moved to content blocks for OpenRouter compatibility.

Embeddings

from litellm import embedding

response = embedding(
    model="openrouter/openai/text-embedding-3-small",
    input=["Text to embed", "Another text"]
)

embeddings = [data.embedding for data in response.data]

Image Generation

from litellm import image_generation

response = image_generation(
    model="openrouter/openai/dall-e-3",
    prompt="A beautiful sunset over mountains",
    n=1,
    size="1024x1024"
)

image_url = response.data[0].url

Configuration

from litellm import completion

response = completion(
    model="openrouter/anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Hello!"}],
    temperature=0.7,
    max_tokens=1000,
    top_p=0.9,
    frequency_penalty=0.5,
    presence_penalty=0.5,
    # OpenRouter-specific
    transforms=["middle-out"],  # Compression
    models=["anthropic/claude-3.5-sonnet"],  # Provider preference
    route="fallback"  # Routing strategy
)

Supported Parameters

ParameterTypeDescription
temperaturefloatRandomness (0-2)
max_tokensintMax output tokens
max_completion_tokensintAlternative to max_tokens
top_pfloatNucleus sampling
frequency_penaltyfloatReduce repetition
presence_penaltyfloatEncourage diversity
stoplistStop sequences
nintNumber of completions
reasoning_effortstrReasoning level
transformslistText transformations
modelslistProvider preferences
routestrRouting strategy

Error Handling

from litellm import completion
from litellm.exceptions import APIError, RateLimitError

try:
    response = completion(
        model="openrouter/anthropic/claude-3.5-sonnet",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except RateLimitError as e:
    print(f"Rate limit: {e}")
except APIError as e:
    print(f"Error: {e.status_code} - {e.message}")
    # Check OpenRouter dashboard for credits

LiteLLM Proxy

model_list:
  - model_name: claude-3.5-sonnet
    litellm_params:
      model: openrouter/anthropic/claude-3.5-sonnet
      api_key: os.environ/OPENROUTER_API_KEY
  
  - model_name: gpt-4o
    litellm_params:
      model: openrouter/openai/gpt-4o
      api_key: os.environ/OPENROUTER_API_KEY
import openai

client = openai.OpenAI(
    api_key="sk-1234",
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Hello!"}]
)

Best Practices

  • Monitor costs via OpenRouter dashboard
  • Use cheaper models for simple tasks
  • Enable prompt caching for repeated contexts
  • LiteLLM automatically includes usage tracking
  • Use models parameter to control providers
  • Set route="fallback" for reliability
  • Different providers may have different capabilities
  • Use streaming for better UX
  • Enable prompt caching for faster responses
  • Choose providers based on latency needs

Supported Models

OpenRouter provides access to 100+ models. Visit openrouter.ai/models for the complete list. Popular categories:
  • Anthropic Claude (all versions)
  • OpenAI GPT (all versions)
  • Google Gemini
  • Meta Llama
  • Mistral AI
  • Cohere
  • And many more
Model availability and pricing vary. Check OpenRouter’s website for current offerings.

Build docs developers (and LLMs) love