Skip to main content

Installation

Install LiteLLM using pip:
pip install litellm

Basic Usage

LiteLLM provides a simple, unified interface to call any LLM. All you need to do is set the appropriate environment variables and use the completion() function.
1

Set API Keys

Set your API keys as environment variables:
import os

# Set API keys for the providers you want to use
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
2

Make Your First Call

Use the completion() function with any supported model:
from litellm import completion

# Call OpenAI GPT-4
response = completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

# Call Anthropic Claude
response = completion(
    model="anthropic/claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
3

Try Streaming

Enable streaming for real-time responses:
from litellm import completion

response = completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku about coding"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Supported Providers

LiteLLM supports 100+ providers. Here are examples of the most popular ones:
from litellm import completion

response = completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Async Support

LiteLLM provides async support out of the box:
import asyncio
from litellm import acompletion

async def main():
    response = await acompletion(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

    # Async streaming
    response = await acompletion(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Write a haiku"}],
        stream=True
    )

    async for chunk in response:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")

asyncio.run(main())

Function Calling

LiteLLM standardizes function calling across all providers:
from litellm import completion

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]

response = completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in SF?"}],
    tools=tools,
    tool_choice="auto"
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

Error Handling

LiteLLM provides OpenAI-compatible exceptions:
from litellm import completion
from litellm.exceptions import (
    RateLimitError,
    AuthenticationError,
    ContextWindowExceededError,
    APIError
)

try:
    response = completion(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except RateLimitError as e:
    print(f"Rate limit exceeded: {e}")
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except ContextWindowExceededError as e:
    print(f"Context too long: {e}")
except APIError as e:
    print(f"API error: {e}")

Router with Fallbacks

The Router provides load balancing and automatic fallbacks:
from litellm import Router

router = Router(
    model_list=[
        {
            "model_name": "gpt-4",
            "litellm_params": {
                "model": "openai/gpt-4o",
                "api_key": "your-openai-key"
            }
        },
        {
            "model_name": "gpt-4",
            "litellm_params": {
                "model": "anthropic/claude-sonnet-4-20250514",
                "api_key": "your-anthropic-key"
            }
        }
    ],
    fallbacks=[("gpt-4", ["gpt-4"])],  # Fallback to Claude if OpenAI fails
    num_retries=2
)

response = router.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

Embeddings

Generate embeddings with any provider:
from litellm import embedding

response = embedding(
    model="openai/text-embedding-3-small",
    input=["Hello, world!", "LiteLLM is awesome"]
)

print(f"Embeddings: {len(response.data)} vectors")
print(f"Dimensions: {len(response.data[0].embedding)}")

Image Generation

Generate images with supported providers:
from litellm import image_generation

response = image_generation(
    model="openai/dall-e-3",
    prompt="A serene landscape with mountains and a lake",
    n=1,
    size="1024x1024"
)

print(f"Image URL: {response.data[0].url}")

What’s Next?

Explore Providers

Learn about all 100+ supported providers and their capabilities

Caching

Enable caching to reduce costs and improve response times

Observability

Integrate with Langfuse, Lunary, MLflow, and other observability tools

Deploy Proxy

Deploy the AI Gateway for team-wide LLM access
Need Help? Join our Discord community or check out the full documentation.

Build docs developers (and LLMs) love