Skip to main content

Overview

Perform chat completions using any of LiteLLM’s 100+ supported LLM providers. Returns responses in OpenAI format.

Function Signature

def completion(
    model: str,
    messages: List = [],
    # Optional OpenAI params
    timeout: Optional[Union[float, str, httpx.Timeout]] = None,
    temperature: Optional[float] = None,
    top_p: Optional[float] = None,
    n: Optional[int] = None,
    stream: Optional[bool] = None,
    stream_options: Optional[dict] = None,
    stop = None,
    max_completion_tokens: Optional[int] = None,
    max_tokens: Optional[int] = None,
    modalities: Optional[List[ChatCompletionModality]] = None,
    prediction: Optional[ChatCompletionPredictionContentParam] = None,
    audio: Optional[ChatCompletionAudioParam] = None,
    presence_penalty: Optional[float] = None,
    frequency_penalty: Optional[float] = None,
    logit_bias: Optional[dict] = None,
    user: Optional[str] = None,
    # OpenAI v1.0+ params
    reasoning_effort: Optional[Literal["none", "minimal", "low", "medium", "high", "xhigh", "default"]] = None,
    verbosity: Optional[Literal["low", "medium", "high"]] = None,
    response_format: Optional[Union[dict, Type[BaseModel]]] = None,
    seed: Optional[int] = None,
    tools: Optional[List] = None,
    tool_choice: Optional[Union[str, dict]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    parallel_tool_calls: Optional[bool] = None,
    web_search_options: Optional[OpenAIWebSearchOptions] = None,
    deployment_id = None,
    extra_headers: Optional[dict] = None,
    safety_identifier: Optional[str] = None,
    service_tier: Optional[str] = None,
    # Deprecated params
    functions: Optional[List] = None,
    function_call: Optional[str] = None,
    # API configuration
    base_url: Optional[str] = None,
    api_version: Optional[str] = None,
    api_key: Optional[str] = None,
    model_list: Optional[list] = None,
    # LiteLLM specific
    thinking: Optional[AnthropicThinkingParam] = None,
    **kwargs
) -> Union[ModelResponse, CustomStreamWrapper]

Parameters

Required Parameters

model
string
required
The model to use for completion. See supported models for the full list.Examples: gpt-4, claude-3-5-sonnet-20241022, gemini-pro, bedrock/anthropic.claude-v2
messages
List[dict]
required
List of message objects representing the conversation context.Each message should have:
  • role: “system”, “user”, “assistant”, or “tool”
  • content: The message content (string or array for multimodal)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"}
]

Generation Parameters

temperature
float
default:"1.0"
Controls randomness in the output. Higher values (e.g., 1.0) make output more random, lower values (e.g., 0.2) make it more deterministic.Range: 0.0 to 2.0
top_p
float
default:"1.0"
Nucleus sampling parameter. The model considers tokens with top_p probability mass.Range: 0.0 to 1.0
max_tokens
int
Maximum number of tokens to generate in the completion.
max_completion_tokens
int
Upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
n
int
default:"1"
Number of chat completion choices to generate for each input message.
stop
Union[str, List[str]]
Up to 4 sequences where the API will stop generating further tokens.
presence_penalty
float
default:"0.0"
Penalizes new tokens based on their existence in the text so far.Range: -2.0 to 2.0
frequency_penalty
float
default:"0.0"
Penalizes new tokens based on their frequency in the text so far.Range: -2.0 to 2.0
logit_bias
dict
Modify the probability of specific tokens appearing in the completion.Maps token IDs to bias values from -100 to 100.

Streaming

stream
bool
default:"false"
If true, returns a streaming response.
response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")
stream_options
dict
Options for streaming response. Only use when stream=True.
stream_options={"include_usage": True}

Function Calling & Tools

tools
List[dict]
List of tools the model can call. Use OpenAI tool format.
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}]
tool_choice
Union[str, dict]
Controls which tool is called. Options:
  • "none": Don’t call any tool
  • "auto": Let the model decide
  • {"type": "function", "function": {"name": "tool_name"}}: Force specific tool
parallel_tool_calls
bool
default:"true"
Whether to enable parallel function calling.

Response Format

response_format
Union[dict, Type[BaseModel]]
Specify the format of the response.For JSON mode:
response_format={"type": "json_object"}
For structured outputs with Pydantic:
from pydantic import BaseModel

class Response(BaseModel):
    answer: str
    confidence: float

response = litellm.completion(
    model="gpt-4",
    messages=[...],
    response_format=Response
)

Advanced Parameters

reasoning_effort
Literal
Control reasoning effort for reasoning models (e.g., o1, o3).Options: "none", "minimal", "low", "medium", "high", "xhigh", "default"
modalities
List[str]
Output types you want the model to generate.Example: ["text", "audio"]
audio
dict
Parameters for audio output. Required when audio is requested with modalities.
prediction
dict
Configuration for Predicted Output, which can improve response times when large parts of the response are known ahead of time.
logprobs
bool
default:"false"
Whether to return log probabilities of output tokens.
top_logprobs
int
Number of most likely tokens to return at each position (0-5). Requires logprobs=True.
seed
int
Seed for deterministic sampling. Supported by some providers.
user
string
Unique identifier for your end-user, for abuse monitoring.

API Configuration

api_key
string
API key for the provider. If not provided, uses environment variables.
base_url
string
Base URL for the API endpoint.
api_version
string
API version to use (provider-specific).
timeout
Union[float, httpx.Timeout]
default:"600"
Request timeout in seconds.
extra_headers
dict
Additional headers to include in the request.

LiteLLM Specific

custom_llm_provider
string
Override the provider detection. Use for non-standard providers.Example: custom_llm_provider="bedrock"
mock_response
string
Return a mock response for testing/debugging.
max_retries
int
default:"0"
Number of retry attempts on failure.
fallbacks
List[str]
List of fallback models to try if the primary fails.
fallbacks=["gpt-3.5-turbo", "claude-2"]
metadata
dict
Additional metadata to tag the completion call.
thinking
dict
Anthropic thinking parameter for extended thinking mode.
thinking={
    "type": "enabled",
    "budget_tokens": 1000
}

Response

ModelResponse

id
string
Unique identifier for the completion.
choices
List[Choice]
List of completion choices.
created
int
Unix timestamp of when the completion was created.
model
string
Model used for completion.
usage
Usage
Token usage information.
_response_ms
float
Response time in milliseconds (LiteLLM specific).

Usage Examples

Basic Completion

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)

print(response.choices[0].message.content)

Streaming

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Async Completion

import litellm
import asyncio

async def main():
    response = await litellm.acompletion(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Function Calling

import litellm

tools = [{
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and state, e.g. San Francisco, CA"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"]
                }
            },
            "required": ["location"]
        }
    }
}]

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather in Boston?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    print(response.choices[0].message.tool_calls[0].function.name)

Multiple Providers

import litellm

# OpenAI
response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hi"}]
)

# Anthropic
response = litellm.completion(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "Hi"}]
)

# AWS Bedrock
response = litellm.completion(
    model="bedrock/anthropic.claude-v2",
    messages=[{"role": "user", "content": "Hi"}]
)

# Azure OpenAI
response = litellm.completion(
    model="azure/gpt-4",
    messages=[{"role": "user", "content": "Hi"}],
    api_key="your-azure-key",
    api_base="https://your-endpoint.openai.azure.com/",
    api_version="2024-02-01"
)

Error Handling

import litellm
from litellm import AuthenticationError, RateLimitError, Timeout

try:
    response = litellm.completion(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello"}]
    )
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except RateLimitError as e:
    print(f"Rate limit exceeded: {e}")
except Timeout as e:
    print(f"Request timed out: {e}")
except Exception as e:
    print(f"An error occurred: {e}")

Build docs developers (and LLMs) love