Skip to main content

Overview

Together AI provides access to 100+ open-source AI models with blazing-fast inference, competitive pricing, and support for the latest community models. Perfect for developers who want to leverage open-source models at scale. Base URL: https://api.together.xyz

Supported Features

  • ✅ Chat Completions
  • ✅ Completions
  • ✅ Streaming
  • ✅ Embeddings
  • ✅ Function Calling
  • ✅ Vision (select models)
  • ✅ Image Generation
  • ❌ Fine-tuning (via Together platform)

Quick Start

Chat Completions

from portkey_ai import Portkey

client = Portkey(
    provider="together-ai",
    Authorization="***"  # Your Together AI API key
)

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo",
    messages=[
        {"role": "user", "content": "Explain open-source AI models"}
    ]
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Write a story about AI"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Meta Llama

ModelContextDescription
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo130KLargest Llama 3.1
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo130KEfficient Llama 3.1
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo130KFast, compact
meta-llama/Llama-3.3-70B-Instruct-Turbo130KLatest Llama 3.3
meta-llama/Llama-Vision-Free128KVision-enabled

Mistral & Mixtral

ModelContextDescription
mistralai/Mixtral-8x22B-Instruct-v0.164KLarge MoE
mistralai/Mixtral-8x7B-Instruct-v0.132KEfficient MoE
mistralai/Mistral-7B-Instruct-v0.332KCompact model

Qwen

ModelContextDescription
Qwen/Qwen2.5-72B-Instruct-Turbo32KLatest Qwen
Qwen/Qwen2.5-7B-Instruct-Turbo32KFast inference
Qwen/QwQ-32B-Preview32KReasoning model

Image Generation

ModelTypeDescription
black-forest-labs/FLUX.1-schnellImageFast FLUX
stabilityai/stable-diffusion-xl-base-1.0ImageSDXL

Embeddings

ModelDimensionsDescription
togethercomputer/m2-bert-80M-8k-retrieval768Fast embeddings
BAAI/bge-large-en-v1.51024High quality
Together AI excels at:
  • Open-source models - Access 100+ community models
  • Fast inference - Optimized infrastructure
  • Latest models - Quick addition of new releases
  • Cost-effective - Competitive pricing
  • Developer-friendly - Simple API, great docs

Configuration Options

client = Portkey(
    provider="together-ai",
    Authorization="***"  # Bearer token
)
HeaderDescriptionRequired
AuthorizationTogether AI API keyYes

Advanced Features

Function Calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "description": "Get current stock price",
            "parameters": {
                "type": "object",
                "properties": {
                    "symbol": {
                        "type": "string",
                        "description": "Stock symbol"
                    }
                },
                "required": ["symbol"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "What's the price of AAPL?"}],
    tools=tools
)

Vision Models

response = client.chat.completions.create(
    model="meta-llama/Llama-Vision-Free",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/image.jpg"}
            }
        ]
    }]
)

Image Generation

response = client.images.generate(
    model="black-forest-labs/FLUX.1-schnell",
    prompt="A futuristic city with flying cars",
    n=1,
    size="1024x1024"
)

image_url = response.data[0].url
print(f"Generated: {image_url}")

Embeddings

response = client.embeddings.create(
    model="togethercomputer/m2-bert-80M-8k-retrieval",
    input="Together AI provides fast inference for open models"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

Completions (Legacy)

response = client.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    prompt="Complete this sentence: Open source AI is",
    max_tokens=50
)

print(response.choices[0].text)

Fallback Configuration

Fallback to OpenAI:
config = {
    "strategy": {"mode": "fallback"},
    "targets": [
        {
            "provider": "together-ai",
            "api_key": "***",
            "override_params": {"model": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"}
        },
        {
            "provider": "openai",
            "api_key": "sk-***",
            "override_params": {"model": "gpt-4o"}
        }
    ]
}

client = Portkey().with_options(config=config)

Load Balancing

Balance across different Llama models:
config = {
    "strategy": {"mode": "loadbalance"},
    "targets": [
        {
            "provider": "together-ai",
            "api_key": "***",
            "override_params": {"model": "meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo"},
            "weight": 0.2
        },
        {
            "provider": "together-ai",
            "api_key": "***",
            "override_params": {"model": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"},
            "weight": 0.8
        }
    ]
}

client = Portkey().with_options(config=config)

Error Handling

from portkey_ai.exceptions import (
    RateLimitError,
    APIError,
    AuthenticationError
)

try:
    response = client.chat.completions.create(
        model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError as e:
    print(f"Rate limit: {e}")
except AuthenticationError as e:
    print(f"Invalid API key: {e}")
except APIError as e:
    print(f"API error: {e}")

Best Practices

  1. Choose right model size - Balance cost vs capability
  2. Use Turbo models - Optimized for speed
  3. Enable streaming - Better user experience
  4. Leverage function calling - Available on many models
  5. Try vision models - For multimodal tasks
  6. Use embeddings - For semantic search
  7. Monitor costs - Different models have different pricing
  8. Test models - Performance varies by use case

Model Categories

By Size

  • Large (100B+): Best quality, higher cost
  • Medium (30-100B): Balanced performance
  • Small (7-30B): Fast, cost-effective

By Type

  • Chat/Instruct: Conversational models
  • Code: Specialized for coding
  • Vision: Multimodal capabilities
  • MoE: Mixture of Experts for efficiency

Pricing

Together AI offers competitive pricing for open models:

Together AI Pricing

View detailed pricing for all Together AI models

Anyscale

Another open models platform

Groq

Ultra-fast inference

Function Calling

Advanced function calling

Load Balancing

Balance across models

Build docs developers (and LLMs) love