Overview
Together AI provides access to 100+ open-source AI models with blazing-fast inference, competitive pricing, and support for the latest community models. Perfect for developers who want to leverage open-source models at scale. Base URL:https://api.together.xyz
Supported Features
- ✅ Chat Completions
- ✅ Completions
- ✅ Streaming
- ✅ Embeddings
- ✅ Function Calling
- ✅ Vision (select models)
- ✅ Image Generation
- ❌ Fine-tuning (via Together platform)
Quick Start
Chat Completions
Streaming
Popular Models
Meta Llama
| Model | Context | Description |
|---|---|---|
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | 130K | Largest Llama 3.1 |
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | 130K | Efficient Llama 3.1 |
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | 130K | Fast, compact |
meta-llama/Llama-3.3-70B-Instruct-Turbo | 130K | Latest Llama 3.3 |
meta-llama/Llama-Vision-Free | 128K | Vision-enabled |
Mistral & Mixtral
| Model | Context | Description |
|---|---|---|
mistralai/Mixtral-8x22B-Instruct-v0.1 | 64K | Large MoE |
mistralai/Mixtral-8x7B-Instruct-v0.1 | 32K | Efficient MoE |
mistralai/Mistral-7B-Instruct-v0.3 | 32K | Compact model |
Qwen
| Model | Context | Description |
|---|---|---|
Qwen/Qwen2.5-72B-Instruct-Turbo | 32K | Latest Qwen |
Qwen/Qwen2.5-7B-Instruct-Turbo | 32K | Fast inference |
Qwen/QwQ-32B-Preview | 32K | Reasoning model |
Image Generation
| Model | Type | Description |
|---|---|---|
black-forest-labs/FLUX.1-schnell | Image | Fast FLUX |
stabilityai/stable-diffusion-xl-base-1.0 | Image | SDXL |
Embeddings
| Model | Dimensions | Description |
|---|---|---|
togethercomputer/m2-bert-80M-8k-retrieval | 768 | Fast embeddings |
BAAI/bge-large-en-v1.5 | 1024 | High quality |
Together AI excels at:
- Open-source models - Access 100+ community models
- Fast inference - Optimized infrastructure
- Latest models - Quick addition of new releases
- Cost-effective - Competitive pricing
- Developer-friendly - Simple API, great docs
Configuration Options
| Header | Description | Required |
|---|---|---|
Authorization | Together AI API key | Yes |
Advanced Features
Function Calling
Vision Models
Image Generation
Embeddings
Completions (Legacy)
Fallback Configuration
Fallback to OpenAI:Load Balancing
Balance across different Llama models:Error Handling
Best Practices
- Choose right model size - Balance cost vs capability
- Use Turbo models - Optimized for speed
- Enable streaming - Better user experience
- Leverage function calling - Available on many models
- Try vision models - For multimodal tasks
- Use embeddings - For semantic search
- Monitor costs - Different models have different pricing
- Test models - Performance varies by use case
Model Categories
By Size
- Large (100B+): Best quality, higher cost
- Medium (30-100B): Balanced performance
- Small (7-30B): Fast, cost-effective
By Type
- Chat/Instruct: Conversational models
- Code: Specialized for coding
- Vision: Multimodal capabilities
- MoE: Mixture of Experts for efficiency
Pricing
Together AI offers competitive pricing for open models:Together AI Pricing
View detailed pricing for all Together AI models
Related Resources
Anyscale
Another open models platform
Groq
Ultra-fast inference
Function Calling
Advanced function calling
Load Balancing
Balance across models