Cohere

Overview

LiteLLM provides comprehensive support for Cohere’s models including Command R+, chat completions, embeddings, and reranking capabilities.

Quick Start

Install LiteLLM

pip install litellm

Set API Key

export COHERE_API_KEY="your-api-key"

Make Your First Call

from litellm import completion

response = completion(
    model="cohere/command-r-plus",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Supported Models

Command R+
Command R
Command

Most capable model for complex tasks.

from litellm import completion

response = completion(
    model="cohere/command-r-plus",
    messages=[{"role": "user", "content": "Analyze this data..."}],
    max_tokens=1000,
    temperature=0.7
)

Balanced model for general use.

response = completion(
    model="cohere/command-r",
    messages=[{"role": "user", "content": "Summarize this text..."}]
)

Standard chat model.

response = completion(
    model="cohere/command",
    messages=[{"role": "user", "content": "Quick question..."}]
)

Authentication

Environment Variable
Direct Parameter

export COHERE_API_KEY="your-api-key"

from litellm import completion

response = completion(
    model="cohere/command-r-plus",
    messages=[{"role": "user", "content": "Hello!"}]
)

from litellm import completion

response = completion(
    model="cohere/command-r-plus",
    messages=[{"role": "user", "content": "Hello!"}],
    api_key="your-api-key"
)

Function Calling

Cohere supports function calling with automatic tool translation.

from litellm import completion

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather in a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = completion(
    model="cohere/command-r-plus",
    messages=[{"role": "user", "content": "What's the weather in NYC?"}],
    tools=tools
)

# Check for tool calls
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Tool: {tool_call.function.name}")
    print(f"Args: {tool_call.function.arguments}")

Streaming

from litellm import completion

response = completion(
    model="cohere/command-r-plus",
    messages=[{"role": "user", "content": "Write a story..."}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Embeddings

v3 Models
v2 Models
Input Types

Latest embedding models with improved performance.

from litellm import embedding

response = embedding(
    model="cohere/embed-english-v3.0",
    input=["Text to embed", "Another text"]
)

embeddings = [data.embedding for data in response.data]

Previous generation embeddings.

from litellm import embedding

response = embedding(
    model="cohere/embed-english-v2.0",
    input=["Text to embed"]
)

Specify input type for better performance.

from litellm import embedding

# For search queries
response = embedding(
    model="cohere/embed-english-v3.0",
    input=["search query"],
    input_type="search_query"
)

# For documents
response = embedding(
    model="cohere/embed-english-v3.0",
    input=["document content"],
    input_type="search_document"
)

Reranking

Cohere’s rerank models improve search results.

from litellm import rerank

response = rerank(
    model="cohere/rerank-english-v3.0",
    query="What is the capital of France?",
    documents=[
        "Paris is the capital of France.",
        "London is the capital of England.",
        "Berlin is the capital of Germany."
    ],
    top_n=2
)

# Get ranked results
for result in response.results:
    print(f"Score: {result.relevance_score}")
    print(f"Document: {result.document}")

Citations

Cohere automatically provides citations for grounded responses.

from litellm import completion

response = completion(
    model="cohere/command-r-plus",
    messages=[{"role": "user", "content": "Tell me about LiteLLM"}],
    documents=[
        {"text": "LiteLLM is a unified interface for LLMs."},
        {"text": "It supports 100+ LLM providers."}
    ]
)

# Access citations
if hasattr(response, 'citations'):
    for citation in response.citations:
        print(f"Cited document: {citation.document_ids}")

Configuration

Basic Config
Cohere-Specific

from litellm import completion

response = completion(
    model="cohere/command-r-plus",
    messages=[{"role": "user", "content": "Hello!"}],
    temperature=0.8,
    max_tokens=500,
    top_p=0.9,
    frequency_penalty=0.5,
    presence_penalty=0.5
)

from litellm import completion

response = completion(
    model="cohere/command-r-plus",
    messages=[{"role": "user", "content": "Hello!"}],
    # Cohere-specific parameters
    preamble="You are a helpful assistant.",
    k=50,  # Top-k sampling
    p=0.75,  # Nucleus sampling
    seed=42  # For reproducibility
)

Supported Parameters

Parameter	Type	Description
`temperature`	float	Randomness (0-1)
`max_tokens`	int	Max output tokens
`max_completion_tokens`	int	Alternative to max_tokens
`top_p`	float	Nucleus sampling
`frequency_penalty`	float	Reduce repetition
`presence_penalty`	float	Encourage diversity
`stop`	list	Stop sequences
`n`	int	Number of completions
`seed`	int	Reproducibility
`preamble`	str	System message
`k`	int	Top-k sampling
`documents`	list	Documents for grounding

Error Handling

from litellm import completion
from litellm.exceptions import APIError, RateLimitError

try:
    response = completion(
        model="cohere/command-r-plus",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except RateLimitError as e:
    print(f"Rate limit exceeded: {e}")
except APIError as e:
    print(f"API error: {e.status_code} - {e.message}")

LiteLLM Proxy

Use Cohere through the LiteLLM proxy server.

model_list:
  - model_name: command-r-plus
    litellm_params:
      model: cohere/command-r-plus
      api_key: os.environ/COHERE_API_KEY

import openai

client = openai.OpenAI(
    api_key="sk-1234",  # LiteLLM proxy key
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="command-r-plus",
    messages=[{"role": "user", "content": "Hello!"}]
)

Best Practices

Token Management

Use max_completion_tokens instead of deprecated max_tokens
Monitor token usage via response.usage
Cohere uses billed units for accurate billing

Performance

Use Command R for balanced performance/cost
Use Command R+ for complex reasoning
Enable streaming for faster perceived response times

Function Calling

LiteLLM automatically converts OpenAI format to Cohere format
Use force_single_step=True when needed
Handle tool results properly in conversation history

Providers

Provider Features

Overview

Quick Start

Supported Models

Authentication

Function Calling

Streaming

Embeddings

Reranking

Citations

Configuration

Supported Parameters

Error Handling

LiteLLM Proxy

Best Practices

Build docs developers (and LLMs) love

Providers

Provider Features

​Overview

​Quick Start

​Supported Models

​Authentication

​Function Calling

​Streaming

​Embeddings

​Reranking

​Citations

​Configuration

​Supported Parameters

​Error Handling

​LiteLLM Proxy

​Best Practices

Build docs developers (and LLMs) love

Overview

Quick Start

Supported Models

Authentication

Function Calling

Streaming

Embeddings

Reranking

Citations

Configuration

Supported Parameters

Error Handling

LiteLLM Proxy

Best Practices