Together AI

Overview

Together AI provides access to 100+ open-source AI models with blazing-fast inference, competitive pricing, and support for the latest community models. Perfect for developers who want to leverage open-source models at scale. Base URL: https://api.together.xyz

Supported Features

✅ Chat Completions
✅ Completions
✅ Streaming
✅ Embeddings
✅ Function Calling
✅ Vision (select models)
✅ Image Generation
❌ Fine-tuning (via Together platform)

Quick Start

Chat Completions

from portkey_ai import Portkey

client = Portkey(
    provider="together-ai",
    Authorization="***"  # Your Together AI API key
)

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo",
    messages=[
        {"role": "user", "content": "Explain open-source AI models"}
    ]
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Write a story about AI"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Popular Models

Meta Llama

Model	Context	Description
`meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo`	130K	Largest Llama 3.1
`meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo`	130K	Efficient Llama 3.1
`meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo`	130K	Fast, compact
`meta-llama/Llama-3.3-70B-Instruct-Turbo`	130K	Latest Llama 3.3
`meta-llama/Llama-Vision-Free`	128K	Vision-enabled

Mistral & Mixtral

Model	Context	Description
`mistralai/Mixtral-8x22B-Instruct-v0.1`	64K	Large MoE
`mistralai/Mixtral-8x7B-Instruct-v0.1`	32K	Efficient MoE
`mistralai/Mistral-7B-Instruct-v0.3`	32K	Compact model

Qwen

Model	Context	Description
`Qwen/Qwen2.5-72B-Instruct-Turbo`	32K	Latest Qwen
`Qwen/Qwen2.5-7B-Instruct-Turbo`	32K	Fast inference
`Qwen/QwQ-32B-Preview`	32K	Reasoning model

Image Generation

Model	Type	Description
`black-forest-labs/FLUX.1-schnell`	Image	Fast FLUX
`stabilityai/stable-diffusion-xl-base-1.0`	Image	SDXL

Embeddings

Model	Dimensions	Description
`togethercomputer/m2-bert-80M-8k-retrieval`	768	Fast embeddings
`BAAI/bge-large-en-v1.5`	1024	High quality

Together AI excels at:

Open-source models - Access 100+ community models
Fast inference - Optimized infrastructure
Latest models - Quick addition of new releases
Cost-effective - Competitive pricing
Developer-friendly - Simple API, great docs

Configuration Options

client = Portkey(
    provider="together-ai",
    Authorization="***"  # Bearer token
)

Header	Description	Required
`Authorization`	Together AI API key	Yes

Advanced Features

Function Calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "description": "Get current stock price",
            "parameters": {
                "type": "object",
                "properties": {
                    "symbol": {
                        "type": "string",
                        "description": "Stock symbol"
                    }
                },
                "required": ["symbol"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "What's the price of AAPL?"}],
    tools=tools
)

Vision Models

response = client.chat.completions.create(
    model="meta-llama/Llama-Vision-Free",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/image.jpg"}
            }
        ]
    }]
)

Image Generation

response = client.images.generate(
    model="black-forest-labs/FLUX.1-schnell",
    prompt="A futuristic city with flying cars",
    n=1,
    size="1024x1024"
)

image_url = response.data[0].url
print(f"Generated: {image_url}")

Embeddings

response = client.embeddings.create(
    model="togethercomputer/m2-bert-80M-8k-retrieval",
    input="Together AI provides fast inference for open models"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

Completions (Legacy)

response = client.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    prompt="Complete this sentence: Open source AI is",
    max_tokens=50
)

print(response.choices[0].text)

Fallback Configuration

Fallback to OpenAI:

config = {
    "strategy": {"mode": "fallback"},
    "targets": [
        {
            "provider": "together-ai",
            "api_key": "***",
            "override_params": {"model": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"}
        },
        {
            "provider": "openai",
            "api_key": "sk-***",
            "override_params": {"model": "gpt-4o"}
        }
    ]
}

client = Portkey().with_options(config=config)

Load Balancing

Balance across different Llama models:

config = {
    "strategy": {"mode": "loadbalance"},
    "targets": [
        {
            "provider": "together-ai",
            "api_key": "***",
            "override_params": {"model": "meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo"},
            "weight": 0.2
        },
        {
            "provider": "together-ai",
            "api_key": "***",
            "override_params": {"model": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"},
            "weight": 0.8
        }
    ]
}

client = Portkey().with_options(config=config)

Error Handling

from portkey_ai.exceptions import (
    RateLimitError,
    APIError,
    AuthenticationError
)

try:
    response = client.chat.completions.create(
        model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError as e:
    print(f"Rate limit: {e}")
except AuthenticationError as e:
    print(f"Invalid API key: {e}")
except APIError as e:
    print(f"API error: {e}")

Best Practices

Choose right model size - Balance cost vs capability
Use Turbo models - Optimized for speed
Enable streaming - Better user experience
Leverage function calling - Available on many models
Try vision models - For multimodal tasks
Use embeddings - For semantic search
Monitor costs - Different models have different pricing
Test models - Performance varies by use case

Model Categories

By Size

Large (100B+): Best quality, higher cost
Medium (30-100B): Balanced performance
Small (7-30B): Fast, cost-effective

By Type

Chat/Instruct: Conversational models
Code: Specialized for coding
Vision: Multimodal capabilities
MoE: Mixture of Experts for efficiency

Pricing

Together AI offers competitive pricing for open models:

Together AI Pricing

View detailed pricing for all Together AI models

Anyscale

Another open models platform

Groq

Ultra-fast inference

Function Calling

Advanced function calling

Load Balancing

Balance across models

Overview

Major Providers

Specialized Providers

Overview

Supported Features

Quick Start

Chat Completions

Streaming

Popular Models

Meta Llama

Mistral & Mixtral

Qwen

Image Generation

Embeddings

Configuration Options

Advanced Features

Function Calling

Vision Models

Image Generation

Embeddings

Completions (Legacy)

Fallback Configuration

Load Balancing

Error Handling

Best Practices

Model Categories

By Size

By Type

Pricing

Together AI Pricing

Anyscale

Groq

Function Calling

Load Balancing

Build docs developers (and LLMs) love

Overview

Major Providers

Specialized Providers

​Overview

​Supported Features

​Quick Start

​Chat Completions

​Streaming

​Popular Models

​Meta Llama

​Mistral & Mixtral

​Qwen

​Image Generation

​Embeddings

​Configuration Options

​Advanced Features

​Function Calling

​Vision Models

​Image Generation

​Embeddings

​Completions (Legacy)

​Fallback Configuration

​Load Balancing

​Error Handling

​Best Practices

​Model Categories

​By Size

​By Type

​Pricing

Together AI Pricing

​Related Resources

Anyscale

Groq

Function Calling

Load Balancing

Build docs developers (and LLMs) love

Overview

Supported Features

Quick Start

Chat Completions

Streaming

Popular Models

Meta Llama

Mistral & Mixtral

Qwen

Image Generation

Embeddings

Configuration Options

Advanced Features

Function Calling

Vision Models

Image Generation

Embeddings

Completions (Legacy)

Fallback Configuration

Load Balancing

Error Handling

Best Practices

Model Categories

By Size

By Type

Pricing

Related Resources