Skip to main content

Overview

The completion() function provides a unified interface to call 100+ LLM providers. It translates OpenAI-format requests to provider-specific formats and returns standardized responses.

Basic Usage

from litellm import completion

response = completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)

print(response.choices[0].message.content)

Function Signature

def completion(
    model: str,
    messages: List[Dict[str, str]],
    # Optional OpenAI params
    functions: Optional[List] = None,
    function_call: Optional[str] = None,
    timeout: Optional[Union[float, str, httpx.Timeout]] = None,
    temperature: Optional[float] = None,
    top_p: Optional[float] = None,
    n: Optional[int] = None,
    stream: Optional[bool] = None,
    stream_options: Optional[dict] = None,
    stop: Optional[Union[str, List[str]]] = None,
    max_tokens: Optional[int] = None,
    max_completion_tokens: Optional[int] = None,
    presence_penalty: Optional[float] = None,
    frequency_penalty: Optional[float] = None,
    logit_bias: Optional[dict] = None,
    user: Optional[str] = None,
    # OpenAI v1.0+ params
    response_format: Optional[Union[dict, Type[BaseModel]]] = None,
    seed: Optional[int] = None,
    tools: Optional[List] = None,
    tool_choice: Optional[Union[str, dict]] = None,
    parallel_tool_calls: Optional[bool] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    reasoning_effort: Optional[str] = None,
    # API configuration
    base_url: Optional[str] = None,
    api_version: Optional[str] = None,
    api_key: Optional[str] = None,
    extra_headers: Optional[dict] = None,
    # LiteLLM params
    custom_llm_provider: Optional[str] = None,
    **kwargs
) -> Union[ModelResponse, CustomStreamWrapper]

Parameters

model
string
required
The model to use for completion. Examples: gpt-4, claude-3-5-sonnet-20241022, gemini-pro
messages
List[Dict[str, str]]
required
List of messages in the conversation. Each message should have role and content fields.
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
]

Optional Parameters

temperature
float
Controls randomness in the output (0.0 to 2.0). Lower values make output more focused and deterministic.
max_tokens
int
Maximum number of tokens to generate in the completion.
stream
bool
If True, returns a streaming response. Default: False
tools
List[Dict]
List of tools (functions) the model can call. See Function Calling for details.
response_format
Union[dict, Type[BaseModel]]
Specify the output format. Can be a dict with {"type": "json_object"} or a Pydantic model.
timeout
float
Request timeout in seconds. Default: 600 (10 minutes)
api_key
str
API key for the provider. If not provided, reads from environment variables.
base_url
str
Custom API base URL for the provider.

Response Format

The function returns a ModelResponse object with the following structure:
class ModelResponse:
    id: str
    choices: List[Choices]
    created: int
    model: str
    object: str
    system_fingerprint: Optional[str]
    usage: Usage

class Choices:
    finish_reason: str
    index: int
    message: Message

class Message:
    content: str
    role: str
    tool_calls: Optional[List[ChatCompletionMessageToolCall]]

class Usage:
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int

Examples

Basic Completion

from litellm import completion

response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}],
    temperature=0.7,
    max_tokens=200
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Using Different Providers

response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    api_key="sk-..."
)

Structured Output with JSON

from litellm import completion

response = completion(
    model="gpt-4",
    messages=[{
        "role": "user",
        "content": "Extract the name and age: 'John is 30 years old'"
    }],
    response_format={"type": "json_object"}
)

print(response.choices[0].message.content)  # {"name": "John", "age": 30}

Structured Output with Pydantic

from litellm import completion
from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int

response = completion(
    model="gpt-4",
    messages=[{
        "role": "user",
        "content": "Extract the name and age: 'John is 30 years old'"
    }],
    response_format=Person
)

person = Person.model_validate_json(response.choices[0].message.content)
print(person.name, person.age)  # John 30

System Messages and Context

response = completion(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant. Be concise."},
        {"role": "user", "content": "Write a Python function to reverse a string"},
    ],
    temperature=0.3
)

Setting Timeouts

import httpx
from litellm import completion

# Simple timeout
response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    timeout=30.0  # 30 seconds
)

# Advanced timeout with httpx.Timeout
response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    timeout=httpx.Timeout(connect=5.0, read=30.0, write=10.0, pool=5.0)
)

Error Handling

from litellm import completion
from litellm.exceptions import (
    AuthenticationError,
    RateLimitError,
    ContextWindowExceededError,
    Timeout
)

try:
    response = completion(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello"}]
    )
except AuthenticationError as e:
    print(f"Invalid API key: {e}")
except RateLimitError as e:
    print(f"Rate limit exceeded: {e}")
except ContextWindowExceededError as e:
    print(f"Context too large: {e}")
except Timeout as e:
    print(f"Request timed out: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Return Types

Non-Streaming Response

Returns a ModelResponse object:
response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

print(response.id)  # "chatcmpl-123"
print(response.model)  # "gpt-4"
print(response.choices[0].message.content)  # "Hello! How can I help you?"
print(response.usage.total_tokens)  # 25

Streaming Response

Returns a CustomStreamWrapper object. See Streaming for details.

Build docs developers (and LLMs) love