Cohere

Overview

Cohere provides enterprise-grade language models specialized for business applications, including powerful chat models, best-in-class embeddings, and reranking capabilities. Access Cohere through Portkey for production-ready NLP. Base URL: https://api.cohere.ai

Supported Features

✅ Chat Completions (v2 API)
✅ Streaming
✅ Embeddings
✅ Rerank (via Cohere API)
✅ Tool Use (Function Calling)
✅ Document Mode (RAG)
✅ Citation Mode
✅ Batch Embeddings
❌ Image Generation
❌ Vision

Quick Start

Chat Completions

from portkey_ai import Portkey

client = Portkey(
    provider="cohere",
    Authorization="***"  # Your Cohere API key
)

response = client.chat.completions.create(
    model="command-r-plus-08-2024",
    messages=[
        {"role": "user", "content": "Explain RAG in simple terms"}
    ]
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="command-r-plus-08-2024",
    messages=[{"role": "user", "content": "Write a haiku about AI"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Available Models

Chat Models

Model	Context	Description	Best For
`command-r-plus-08-2024`	128K	Most capable	Complex tasks, RAG
`command-r-08-2024`	128K	Efficient	General purpose
`command-r-plus`	128K	Previous generation	Legacy apps
`command-r`	128K	Previous generation	Legacy apps
`command`	4K	Legacy model	Simple tasks
`command-light`	4K	Lightweight	Fast responses

Embedding Models

Model	Dimensions	Description
`embed-english-v3.0`	1024	English embeddings
`embed-multilingual-v3.0`	1024	100+ languages
`embed-english-light-v3.0`	384	Compact English
`embed-multilingual-light-v3.0`	384	Compact multilingual
`embed-english-v2.0`	4096	Legacy

Cohere excels at:

Enterprise deployments with strong support
RAG applications with citation support
Multilingual tasks (100+ languages)
Semantic search with best-in-class embeddings
Document grounding for factual responses

Configuration Options

Headers

client = Portkey(
    provider="cohere",
    Authorization="***"  # Bearer token format: "Bearer co-***" or just "co-***"
)

Header	Description	Required
`Authorization`	Cohere API key (Bearer token)	Yes

Advanced Features

Tool Use (Function Calling)

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_products",
            "description": "Search for products in the catalog",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query"
                    },
                    "category": {
                        "type": "string",
                        "description": "Product category"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="command-r-plus-08-2024",
    messages=[{"role": "user", "content": "Find laptops under $1000"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

RAG with Document Grounding

Cohere excels at RAG with built-in citation support:

# Documents to ground the response
documents = [
    {
        "id": "doc1",
        "text": "Portkey is an AI Gateway that routes to 250+ LLMs."
    },
    {
        "id": "doc2",
        "text": "The gateway provides fallbacks, load balancing, and caching."
    }
]

response = client.chat.completions.create(
    model="command-r-plus-08-2024",
    messages=[{"role": "user", "content": "What features does Portkey offer?"}],
    # Pass documents via additional parameters
    documents=documents,
    citation_quality="accurate"
)

print(response.choices[0].message.content)

# Access citations if available
if hasattr(response.choices[0].message, 'citations'):
    print("Citations:", response.choices[0].message.citations)

Embeddings

response = client.embeddings.create(
    model="embed-english-v3.0",
    input="Cohere provides enterprise-grade NLP",
    input_type="search_document"  # or "search_query", "classification", "clustering"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

Batch embeddings:

response = client.embeddings.create(
    model="embed-english-v3.0",
    input=[
        "First document",
        "Second document",
        "Third document"
    ],
    input_type="search_document"
)

for i, item in enumerate(response.data):
    print(f"Document {i}: {len(item.embedding)} dimensions")

Embedding Input Types

Optimize embeddings for your use case:

Input Type	Use Case
`search_document`	Indexing documents for search
`search_query`	Search queries
`classification`	Text classification
`clustering`	Document clustering

Legacy Completions API

For older command models:

response = client.completions.create(
    model="command",
    prompt="Write a tagline for an AI gateway:",
    max_tokens=50
)

print(response.choices[0].text)

Fallback Configuration

Fallback to GPT-4 if Cohere fails:

config = {
    "strategy": {"mode": "fallback"},
    "targets": [
        {
            "provider": "cohere",
            "api_key": "co-***",
            "override_params": {"model": "command-r-plus-08-2024"}
        },
        {
            "provider": "openai",
            "api_key": "sk-***",
            "override_params": {"model": "gpt-4o"}
        }
    ]
}

client = Portkey().with_options(config=config)

Load Balancing

Balance between Command R+ and Command R:

config = {
    "strategy": {"mode": "loadbalance"},
    "targets": [
        {
            "provider": "cohere",
            "api_key": "co-***",
            "override_params": {"model": "command-r-plus-08-2024"},
            "weight": 0.3
        },
        {
            "provider": "cohere",
            "api_key": "co-***",
            "override_params": {"model": "command-r-08-2024"},
            "weight": 0.7
        }
    ]
}

client = Portkey().with_options(config=config)

Error Handling

from portkey_ai.exceptions import (
    RateLimitError,
    APIError,
    AuthenticationError
)

try:
    response = client.chat.completions.create(
        model="command-r-plus-08-2024",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError as e:
    print(f"Rate limit exceeded: {e}")
except AuthenticationError as e:
    print(f"Invalid API key: {e}")
except APIError as e:
    print(f"API error: {e}")

Best Practices

Use RAG mode - Leverage document grounding for factual accuracy
Enable citations - Track sources for enterprise use
Choose right embedding type - Use appropriate input_type for embeddings
Use Command R+ - For complex tasks requiring reasoning
Use Command R - For cost-effective general purpose tasks
Batch embeddings - More efficient than individual requests
Implement streaming - Better UX for long responses
Handle tool calls - Multi-step reasoning with function calling

Enterprise Features

Data privacy: Cohere doesn’t train on customer data
Regional deployment: Available in multiple regions
SOC 2 Type II: Enterprise compliance
Custom deployments: Private cloud options
SLA support: Enterprise support plans
Fine-tuning: Custom model training

Pricing

Cohere offers competitive pricing with a free trial:

Cohere Pricing

View detailed pricing for all Cohere models

Embeddings Guide

Working with embeddings

RAG Guide

Building RAG applications

Function Calling

Tool use and function calling

Fallbacks

Fallback configurations

Overview

Major Providers

Specialized Providers

Overview

Supported Features

Quick Start

Chat Completions

Streaming

Available Models

Chat Models

Embedding Models

Configuration Options

Headers

Advanced Features

Tool Use (Function Calling)

RAG with Document Grounding

Embeddings

Embedding Input Types

Legacy Completions API

Fallback Configuration

Load Balancing

Error Handling

Best Practices

Enterprise Features

Pricing

Cohere Pricing

Embeddings Guide

RAG Guide

Function Calling

Fallbacks

Build docs developers (and LLMs) love

Overview

Major Providers

Specialized Providers

​Overview

​Supported Features

​Quick Start

​Chat Completions

​Streaming

​Available Models

​Chat Models

​Embedding Models

​Configuration Options

​Headers

​Advanced Features

​Tool Use (Function Calling)

​RAG with Document Grounding

​Embeddings

​Embedding Input Types

​Legacy Completions API

​Fallback Configuration

​Load Balancing

​Error Handling

​Best Practices

​Enterprise Features

​Pricing

Cohere Pricing

​Related Resources

Embeddings Guide

RAG Guide

Function Calling

Fallbacks

Build docs developers (and LLMs) love

Overview

Supported Features

Quick Start

Chat Completions

Streaming

Available Models

Chat Models

Embedding Models

Configuration Options

Headers

Advanced Features

Tool Use (Function Calling)

RAG with Document Grounding

Embeddings

Embedding Input Types

Legacy Completions API

Fallback Configuration

Load Balancing

Error Handling

Best Practices

Enterprise Features

Pricing

Related Resources