Overview
Perform chat completions using any of LiteLLM’s 100+ supported LLM providers. Returns responses in OpenAI format.
Function Signature
def completion (
model : str ,
messages : List = [],
# Optional OpenAI params
timeout : Optional[Union[ float , str , httpx.Timeout]] = None ,
temperature : Optional[ float ] = None ,
top_p : Optional[ float ] = None ,
n : Optional[ int ] = None ,
stream : Optional[ bool ] = None ,
stream_options : Optional[ dict ] = None ,
stop = None ,
max_completion_tokens : Optional[ int ] = None ,
max_tokens : Optional[ int ] = None ,
modalities : Optional[List[ChatCompletionModality]] = None ,
prediction : Optional[ChatCompletionPredictionContentParam] = None ,
audio : Optional[ChatCompletionAudioParam] = None ,
presence_penalty : Optional[ float ] = None ,
frequency_penalty : Optional[ float ] = None ,
logit_bias : Optional[ dict ] = None ,
user : Optional[ str ] = None ,
# OpenAI v1.0+ params
reasoning_effort : Optional[Literal[ "none" , "minimal" , "low" , "medium" , "high" , "xhigh" , "default" ]] = None ,
verbosity : Optional[Literal[ "low" , "medium" , "high" ]] = None ,
response_format : Optional[Union[ dict , Type[BaseModel]]] = None ,
seed : Optional[ int ] = None ,
tools : Optional[List] = None ,
tool_choice : Optional[Union[ str , dict ]] = None ,
logprobs : Optional[ bool ] = None ,
top_logprobs : Optional[ int ] = None ,
parallel_tool_calls : Optional[ bool ] = None ,
web_search_options : Optional[OpenAIWebSearchOptions] = None ,
deployment_id = None ,
extra_headers : Optional[ dict ] = None ,
safety_identifier : Optional[ str ] = None ,
service_tier : Optional[ str ] = None ,
# Deprecated params
functions : Optional[List] = None ,
function_call : Optional[ str ] = None ,
# API configuration
base_url : Optional[ str ] = None ,
api_version : Optional[ str ] = None ,
api_key : Optional[ str ] = None ,
model_list : Optional[ list ] = None ,
# LiteLLM specific
thinking : Optional[AnthropicThinkingParam] = None ,
** kwargs
) -> Union[ModelResponse, CustomStreamWrapper]
Parameters
Required Parameters
The model to use for completion. See supported models for the full list. Examples: gpt-4, claude-3-5-sonnet-20241022, gemini-pro, bedrock/anthropic.claude-v2
List of message objects representing the conversation context. Each message should have:
role: “system”, “user”, “assistant”, or “tool”
content: The message content (string or array for multimodal)
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "What is 2+2?" }
]
Generation Parameters
Controls randomness in the output. Higher values (e.g., 1.0) make output more random, lower values (e.g., 0.2) make it more deterministic. Range: 0.0 to 2.0
Nucleus sampling parameter. The model considers tokens with top_p probability mass. Range: 0.0 to 1.0
Maximum number of tokens to generate in the completion.
Upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
Number of chat completion choices to generate for each input message.
Up to 4 sequences where the API will stop generating further tokens.
Penalizes new tokens based on their existence in the text so far. Range: -2.0 to 2.0
Penalizes new tokens based on their frequency in the text so far. Range: -2.0 to 2.0
Modify the probability of specific tokens appearing in the completion. Maps token IDs to bias values from -100 to 100.
Streaming
If true, returns a streaming response. response = litellm.completion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Count to 10" }],
stream = True
)
for chunk in response:
print (chunk.choices[ 0 ].delta.content or "" , end = "" )
Options for streaming response. Only use when stream=True. stream_options = { "include_usage" : True }
List of tools the model can call. Use OpenAI tool format. tools = [{
"type" : "function" ,
"function" : {
"name" : "get_weather" ,
"description" : "Get the current weather" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"location" : { "type" : "string" }
},
"required" : [ "location" ]
}
}
}]
Controls which tool is called. Options:
"none": Don’t call any tool
"auto": Let the model decide
{"type": "function", "function": {"name": "tool_name"}}: Force specific tool
Whether to enable parallel function calling.
response_format
Union[dict, Type[BaseModel]]
Specify the format of the response. For JSON mode: response_format = { "type" : "json_object" }
For structured outputs with Pydantic: from pydantic import BaseModel
class Response ( BaseModel ):
answer: str
confidence: float
response = litellm.completion(
model = "gpt-4" ,
messages = [ ... ],
response_format = Response
)
Advanced Parameters
Control reasoning effort for reasoning models (e.g., o1, o3). Options: "none", "minimal", "low", "medium", "high", "xhigh", "default"
Output types you want the model to generate. Example: ["text", "audio"]
Parameters for audio output. Required when audio is requested with modalities.
Configuration for Predicted Output, which can improve response times when large parts of the response are known ahead of time.
Whether to return log probabilities of output tokens.
Number of most likely tokens to return at each position (0-5). Requires logprobs=True.
Seed for deterministic sampling. Supported by some providers.
Unique identifier for your end-user, for abuse monitoring.
API Configuration
API key for the provider. If not provided, uses environment variables.
Base URL for the API endpoint.
API version to use (provider-specific).
timeout
Union[float, httpx.Timeout]
default: "600"
Request timeout in seconds.
Additional headers to include in the request.
LiteLLM Specific
Override the provider detection. Use for non-standard providers. Example: custom_llm_provider="bedrock"
Return a mock response for testing/debugging.
Number of retry attempts on failure.
List of fallback models to try if the primary fails. fallbacks = [ "gpt-3.5-turbo" , "claude-2" ]
Additional metadata to tag the completion call.
Anthropic thinking parameter for extended thinking mode. thinking = {
"type" : "enabled" ,
"budget_tokens" : 1000
}
Response
ModelResponse
Unique identifier for the completion.
List of completion choices. The generated message. Role of the message (“assistant”).
Tool calls made by the model.
Reason for completion: “stop”, “length”, “tool_calls”, “content_filter”
Unix timestamp of when the completion was created.
Model used for completion.
Token usage information. Number of tokens in the prompt.
Number of tokens in the completion.
Response time in milliseconds (LiteLLM specific).
Usage Examples
Basic Completion
import litellm
response = litellm.completion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Hello, how are you?" }]
)
print (response.choices[ 0 ].message.content)
Streaming
import litellm
response = litellm.completion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Write a story" }],
stream = True
)
for chunk in response:
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" )
Async Completion
import litellm
import asyncio
async def main ():
response = await litellm.acompletion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response.choices[ 0 ].message.content)
asyncio.run(main())
Function Calling
import litellm
tools = [{
"type" : "function" ,
"function" : {
"name" : "get_current_weather" ,
"description" : "Get the current weather in a location" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"location" : {
"type" : "string" ,
"description" : "City and state, e.g. San Francisco, CA"
},
"unit" : {
"type" : "string" ,
"enum" : [ "celsius" , "fahrenheit" ]
}
},
"required" : [ "location" ]
}
}
}]
response = litellm.completion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "What's the weather in Boston?" }],
tools = tools
)
if response.choices[ 0 ].message.tool_calls:
print (response.choices[ 0 ].message.tool_calls[ 0 ].function.name)
Multiple Providers
import litellm
# OpenAI
response = litellm.completion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Hi" }]
)
# Anthropic
response = litellm.completion(
model = "claude-3-5-sonnet-20241022" ,
messages = [{ "role" : "user" , "content" : "Hi" }]
)
# AWS Bedrock
response = litellm.completion(
model = "bedrock/anthropic.claude-v2" ,
messages = [{ "role" : "user" , "content" : "Hi" }]
)
# Azure OpenAI
response = litellm.completion(
model = "azure/gpt-4" ,
messages = [{ "role" : "user" , "content" : "Hi" }],
api_key = "your-azure-key" ,
api_base = "https://your-endpoint.openai.azure.com/" ,
api_version = "2024-02-01"
)
Error Handling
import litellm
from litellm import AuthenticationError, RateLimitError, Timeout
try :
response = litellm.completion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Hello" }]
)
except AuthenticationError as e:
print ( f "Authentication failed: { e } " )
except RateLimitError as e:
print ( f "Rate limit exceeded: { e } " )
except Timeout as e:
print ( f "Request timed out: { e } " )
except Exception as e:
print ( f "An error occurred: { e } " )