Overview
LiteLLM provides comprehensive support for Cohere’s models including Command R+, chat completions, embeddings, and reranking capabilities.Quick Start
Supported Models
- Command R+
- Command R
- Command
Most capable model for complex tasks.
Authentication
- Environment Variable
- Direct Parameter
Function Calling
Cohere supports function calling with automatic tool translation.Streaming
Embeddings
- v3 Models
- v2 Models
- Input Types
Latest embedding models with improved performance.
Reranking
Cohere’s rerank models improve search results.Citations
Cohere automatically provides citations for grounded responses.Configuration
- Basic Config
- Cohere-Specific
Supported Parameters
| Parameter | Type | Description |
|---|---|---|
temperature | float | Randomness (0-1) |
max_tokens | int | Max output tokens |
max_completion_tokens | int | Alternative to max_tokens |
top_p | float | Nucleus sampling |
frequency_penalty | float | Reduce repetition |
presence_penalty | float | Encourage diversity |
stop | list | Stop sequences |
n | int | Number of completions |
seed | int | Reproducibility |
preamble | str | System message |
k | int | Top-k sampling |
documents | list | Documents for grounding |
Error Handling
LiteLLM Proxy
Use Cohere through the LiteLLM proxy server.Best Practices
Token Management
Token Management
- Use
max_completion_tokensinstead of deprecatedmax_tokens - Monitor token usage via
response.usage - Cohere uses billed units for accurate billing
Performance
Performance
- Use Command R for balanced performance/cost
- Use Command R+ for complex reasoning
- Enable streaming for faster perceived response times
Function Calling
Function Calling
- LiteLLM automatically converts OpenAI format to Cohere format
- Use
force_single_step=Truewhen needed - Handle tool results properly in conversation history