Overview
The completion() function provides a unified interface to call 100+ LLM providers. It translates OpenAI-format requests to provider-specific formats and returns standardized responses.
Basic Usage
from litellm import completion
response = completion(
model = "gpt-3.5-turbo" ,
messages = [{ "role" : "user" , "content" : "Hello, how are you?" }]
)
print (response.choices[ 0 ].message.content)
Function Signature
def completion (
model : str ,
messages : List[Dict[ str , str ]],
# Optional OpenAI params
functions : Optional[List] = None ,
function_call : Optional[ str ] = None ,
timeout : Optional[Union[ float , str , httpx.Timeout]] = None ,
temperature : Optional[ float ] = None ,
top_p : Optional[ float ] = None ,
n : Optional[ int ] = None ,
stream : Optional[ bool ] = None ,
stream_options : Optional[ dict ] = None ,
stop : Optional[Union[ str , List[ str ]]] = None ,
max_tokens : Optional[ int ] = None ,
max_completion_tokens : Optional[ int ] = None ,
presence_penalty : Optional[ float ] = None ,
frequency_penalty : Optional[ float ] = None ,
logit_bias : Optional[ dict ] = None ,
user : Optional[ str ] = None ,
# OpenAI v1.0+ params
response_format : Optional[Union[ dict , Type[BaseModel]]] = None ,
seed : Optional[ int ] = None ,
tools : Optional[List] = None ,
tool_choice : Optional[Union[ str , dict ]] = None ,
parallel_tool_calls : Optional[ bool ] = None ,
logprobs : Optional[ bool ] = None ,
top_logprobs : Optional[ int ] = None ,
reasoning_effort : Optional[ str ] = None ,
# API configuration
base_url : Optional[ str ] = None ,
api_version : Optional[ str ] = None ,
api_key : Optional[ str ] = None ,
extra_headers : Optional[ dict ] = None ,
# LiteLLM params
custom_llm_provider : Optional[ str ] = None ,
** kwargs
) -> Union[ModelResponse, CustomStreamWrapper]
Parameters
The model to use for completion. Examples: gpt-4, claude-3-5-sonnet-20241022, gemini-pro
messages
List[Dict[str, str]]
required
List of messages in the conversation. Each message should have role and content fields. messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "What is the capital of France?" }
]
Optional Parameters
Controls randomness in the output (0.0 to 2.0). Lower values make output more focused and deterministic.
Maximum number of tokens to generate in the completion.
If True, returns a streaming response. Default: False
response_format
Union[dict, Type[BaseModel]]
Specify the output format. Can be a dict with {"type": "json_object"} or a Pydantic model.
Request timeout in seconds. Default: 600 (10 minutes)
API key for the provider. If not provided, reads from environment variables.
Custom API base URL for the provider.
The function returns a ModelResponse object with the following structure:
class ModelResponse :
id : str
choices: List[Choices]
created: int
model: str
object : str
system_fingerprint: Optional[ str ]
usage: Usage
class Choices :
finish_reason: str
index: int
message: Message
class Message :
content: str
role: str
tool_calls: Optional[List[ChatCompletionMessageToolCall]]
class Usage :
prompt_tokens: int
completion_tokens: int
total_tokens: int
Examples
Basic Completion
from litellm import completion
response = completion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Explain quantum computing in simple terms" }],
temperature = 0.7 ,
max_tokens = 200
)
print (response.choices[ 0 ].message.content)
print ( f "Tokens used: { response.usage.total_tokens } " )
Using Different Providers
OpenAI
Anthropic
Google
Azure OpenAI
response = completion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
api_key = "sk-..."
)
Structured Output with JSON
from litellm import completion
response = completion(
model = "gpt-4" ,
messages = [{
"role" : "user" ,
"content" : "Extract the name and age: 'John is 30 years old'"
}],
response_format = { "type" : "json_object" }
)
print (response.choices[ 0 ].message.content) # {"name": "John", "age": 30}
Structured Output with Pydantic
from litellm import completion
from pydantic import BaseModel
class Person ( BaseModel ):
name: str
age: int
response = completion(
model = "gpt-4" ,
messages = [{
"role" : "user" ,
"content" : "Extract the name and age: 'John is 30 years old'"
}],
response_format = Person
)
person = Person.model_validate_json(response.choices[ 0 ].message.content)
print (person.name, person.age) # John 30
System Messages and Context
response = completion(
model = "gpt-4" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful coding assistant. Be concise." },
{ "role" : "user" , "content" : "Write a Python function to reverse a string" },
],
temperature = 0.3
)
Setting Timeouts
import httpx
from litellm import completion
# Simple timeout
response = completion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Hello" }],
timeout = 30.0 # 30 seconds
)
# Advanced timeout with httpx.Timeout
response = completion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Hello" }],
timeout = httpx.Timeout( connect = 5.0 , read = 30.0 , write = 10.0 , pool = 5.0 )
)
Error Handling
from litellm import completion
from litellm.exceptions import (
AuthenticationError,
RateLimitError,
ContextWindowExceededError,
Timeout
)
try :
response = completion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Hello" }]
)
except AuthenticationError as e:
print ( f "Invalid API key: { e } " )
except RateLimitError as e:
print ( f "Rate limit exceeded: { e } " )
except ContextWindowExceededError as e:
print ( f "Context too large: { e } " )
except Timeout as e:
print ( f "Request timed out: { e } " )
except Exception as e:
print ( f "Unexpected error: { e } " )
Return Types
Non-Streaming Response
Returns a ModelResponse object:
response = completion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Hello" }]
)
print (response.id) # "chatcmpl-123"
print (response.model) # "gpt-4"
print (response.choices[ 0 ].message.content) # "Hello! How can I help you?"
print (response.usage.total_tokens) # 25
Streaming Response
Returns a CustomStreamWrapper object. See Streaming for details.