completion()

Overview

Perform chat completions using any of LiteLLM’s 100+ supported LLM providers. Returns responses in OpenAI format.

Function Signature

def completion(
    model: str,
    messages: List = [],
    # Optional OpenAI params
    timeout: Optional[Union[float, str, httpx.Timeout]] = None,
    temperature: Optional[float] = None,
    top_p: Optional[float] = None,
    n: Optional[int] = None,
    stream: Optional[bool] = None,
    stream_options: Optional[dict] = None,
    stop = None,
    max_completion_tokens: Optional[int] = None,
    max_tokens: Optional[int] = None,
    modalities: Optional[List[ChatCompletionModality]] = None,
    prediction: Optional[ChatCompletionPredictionContentParam] = None,
    audio: Optional[ChatCompletionAudioParam] = None,
    presence_penalty: Optional[float] = None,
    frequency_penalty: Optional[float] = None,
    logit_bias: Optional[dict] = None,
    user: Optional[str] = None,
    # OpenAI v1.0+ params
    reasoning_effort: Optional[Literal["none", "minimal", "low", "medium", "high", "xhigh", "default"]] = None,
    verbosity: Optional[Literal["low", "medium", "high"]] = None,
    response_format: Optional[Union[dict, Type[BaseModel]]] = None,
    seed: Optional[int] = None,
    tools: Optional[List] = None,
    tool_choice: Optional[Union[str, dict]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    parallel_tool_calls: Optional[bool] = None,
    web_search_options: Optional[OpenAIWebSearchOptions] = None,
    deployment_id = None,
    extra_headers: Optional[dict] = None,
    safety_identifier: Optional[str] = None,
    service_tier: Optional[str] = None,
    # Deprecated params
    functions: Optional[List] = None,
    function_call: Optional[str] = None,
    # API configuration
    base_url: Optional[str] = None,
    api_version: Optional[str] = None,
    api_key: Optional[str] = None,
    model_list: Optional[list] = None,
    # LiteLLM specific
    thinking: Optional[AnthropicThinkingParam] = None,
    **kwargs
) -> Union[ModelResponse, CustomStreamWrapper]

Parameters

Required Parameters

model

string

required

The model to use for completion. See supported models for the full list.Examples: gpt-4, claude-3-5-sonnet-20241022, gemini-pro, bedrock/anthropic.claude-v2

messages

List[dict]

required

List of message objects representing the conversation context.Each message should have:

role: “system”, “user”, “assistant”, or “tool”
content: The message content (string or array for multimodal)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"}
]

Generation Parameters

temperature

float

default:"1.0"

Controls randomness in the output. Higher values (e.g., 1.0) make output more random, lower values (e.g., 0.2) make it more deterministic.Range: 0.0 to 2.0

top_p

float

default:"1.0"

Nucleus sampling parameter. The model considers tokens with top_p probability mass.Range: 0.0 to 1.0

max_tokens

int

Maximum number of tokens to generate in the completion.

max_completion_tokens

int

Upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

int

default:"1"

Number of chat completion choices to generate for each input message.

stop

Union[str, List[str]]

Up to 4 sequences where the API will stop generating further tokens.

presence_penalty

float

default:"0.0"

Penalizes new tokens based on their existence in the text so far.Range: -2.0 to 2.0

frequency_penalty

float

default:"0.0"

Penalizes new tokens based on their frequency in the text so far.Range: -2.0 to 2.0

logit_bias

dict

Modify the probability of specific tokens appearing in the completion.Maps token IDs to bias values from -100 to 100.

Streaming

stream

bool

default:"false"

If true, returns a streaming response.

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

stream_options

dict

Options for streaming response. Only use when stream=True.

stream_options={"include_usage": True}

Function Calling & Tools

tools

List[dict]

List of tools the model can call. Use OpenAI tool format.

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}]

tool_choice

Union[str, dict]

Controls which tool is called. Options:

"none": Don’t call any tool
"auto": Let the model decide
{"type": "function", "function": {"name": "tool_name"}}: Force specific tool

parallel_tool_calls

bool

default:"true"

Whether to enable parallel function calling.

Response Format

response_format

Union[dict, Type[BaseModel]]

Specify the format of the response.For JSON mode:

response_format={"type": "json_object"}

For structured outputs with Pydantic:

from pydantic import BaseModel

class Response(BaseModel):
    answer: str
    confidence: float

response = litellm.completion(
    model="gpt-4",
    messages=[...],
    response_format=Response
)

Advanced Parameters

reasoning_effort

Literal

Control reasoning effort for reasoning models (e.g., o1, o3).Options: "none", "minimal", "low", "medium", "high", "xhigh", "default"

modalities

List[str]

Output types you want the model to generate.Example: ["text", "audio"]

audio

dict

Parameters for audio output. Required when audio is requested with modalities.

prediction

dict

Configuration for Predicted Output, which can improve response times when large parts of the response are known ahead of time.

logprobs

bool

default:"false"

Whether to return log probabilities of output tokens.

top_logprobs

int

Number of most likely tokens to return at each position (0-5). Requires logprobs=True.

seed

int

Seed for deterministic sampling. Supported by some providers.

user

string

Unique identifier for your end-user, for abuse monitoring.

API Configuration

api_key

string

API key for the provider. If not provided, uses environment variables.

base_url

string

Base URL for the API endpoint.

api_version

string

API version to use (provider-specific).

timeout

Union[float, httpx.Timeout]

default:"600"

Request timeout in seconds.

extra_headers

dict

Additional headers to include in the request.

LiteLLM Specific

custom_llm_provider

string

Override the provider detection. Use for non-standard providers.Example: custom_llm_provider="bedrock"

mock_response

string

Return a mock response for testing/debugging.

max_retries

int

default:"0"

Number of retry attempts on failure.

fallbacks

List[str]

List of fallback models to try if the primary fails.

fallbacks=["gpt-3.5-turbo", "claude-2"]

metadata

dict

Additional metadata to tag the completion call.

thinking

dict

Anthropic thinking parameter for extended thinking mode.

thinking={
    "type": "enabled",
    "budget_tokens": 1000
}

Response

ModelResponse

string

Unique identifier for the completion.

choices

List[Choice]

List of completion choices.

Show Choice object

index

int

Choice index.

message

Message

The generated message.

Show Message object

role

string

Role of the message (“assistant”).

content

string

The message content.

tool_calls

List[ToolCall]

Tool calls made by the model.

finish_reason

string

Reason for completion: “stop”, “length”, “tool_calls”, “content_filter”

created

int

Unix timestamp of when the completion was created.

model

string

Model used for completion.

usage

Usage

Token usage information.

Show Usage object

prompt_tokens

int

Number of tokens in the prompt.

completion_tokens

int

Number of tokens in the completion.

total_tokens

int

Total tokens used.

_response_ms

float

Response time in milliseconds (LiteLLM specific).

Usage Examples

Basic Completion

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)

print(response.choices[0].message.content)

Streaming

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Async Completion

import litellm
import asyncio

async def main():
    response = await litellm.acompletion(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Function Calling

import litellm

tools = [{
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and state, e.g. San Francisco, CA"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"]
                }
            },
            "required": ["location"]
        }
    }
}]

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather in Boston?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    print(response.choices[0].message.tool_calls[0].function.name)

Multiple Providers

import litellm

# OpenAI
response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hi"}]
)

# Anthropic
response = litellm.completion(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "Hi"}]
)

# AWS Bedrock
response = litellm.completion(
    model="bedrock/anthropic.claude-v2",
    messages=[{"role": "user", "content": "Hi"}]
)

# Azure OpenAI
response = litellm.completion(
    model="azure/gpt-4",
    messages=[{"role": "user", "content": "Hi"}],
    api_key="your-azure-key",
    api_base="https://your-endpoint.openai.azure.com/",
    api_version="2024-02-01"
)

Error Handling

import litellm
from litellm import AuthenticationError, RateLimitError, Timeout

try:
    response = litellm.completion(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello"}]
    )
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except RateLimitError as e:
    print(f"Rate limit exceeded: {e}")
except Timeout as e:
    print(f"Request timed out: {e}")
except Exception as e:
    print(f"An error occurred: {e}")

acompletion() - Async version
Router.completion() - Load balanced completions
Embedding API
Supported Providers

SDK Reference

Proxy Endpoints

Configuration

Overview

Function Signature

Parameters

Required Parameters

Generation Parameters

Streaming

Function Calling & Tools

Response Format

Advanced Parameters

API Configuration

LiteLLM Specific

Response

ModelResponse

Usage Examples

Basic Completion

Streaming

Async Completion

Function Calling

Multiple Providers

Error Handling

Build docs developers (and LLMs) love

SDK Reference

Proxy Endpoints

Configuration

​Overview

​Function Signature

​Parameters

​Required Parameters

​Generation Parameters

​Streaming

​Function Calling & Tools

​Response Format

​Advanced Parameters

​API Configuration

​LiteLLM Specific

​Response

​ModelResponse

​Usage Examples

​Basic Completion

​Streaming

​Async Completion

​Function Calling

​Multiple Providers

​Error Handling

​Related

Build docs developers (and LLMs) love

Overview

Function Signature

Parameters

Required Parameters

Generation Parameters

Streaming

Function Calling & Tools

Response Format

Advanced Parameters

API Configuration

LiteLLM Specific

Response

ModelResponse

Usage Examples

Basic Completion

Streaming

Async Completion

Function Calling

Multiple Providers

Error Handling

Related