Model Overview
48 Models
Comprehensive coverage of major AI providers
19 Providers
From OpenAI and Anthropic to specialized providers
Real Pricing
Actual costs per 1M tokens, updated regularly
OpenAI Models
OpenAI models use two primary encodings:
o200k_base (newer, more efficient) for GPT-4o family, and cl100k_base for GPT-4 and GPT-3.5 families.GPT-4o Family
- GPT-4o
- GPT-4o Mini
- Latest encoding technology (o200k_base)
- 128K token context window
- Balanced cost and performance
- Multimodal capabilities
GPT-4 Family
| Model | Context Limit | Input Cost | Output Cost | Encoding |
|---|---|---|---|---|
| GPT-4 Turbo | 128,000 | $10.00 | $30.00 | cl100k_base |
| GPT-4 | 8,192 | $30.00 | $60.00 | cl100k_base |
| GPT-3.5 Turbo | 16,385 | $0.50 | $1.50 | cl100k_base |
Anthropic Models
Claude 3.5 & Claude 3 Family
Claude 3.5 Sonnet
Specifications:
- Context: 200,000 tokens
- Input: $3.00 per 1M
- Output: $15.00 per 1M
- Token Ratio: 1.1x
Claude 3 Opus
Specifications:
- Context: 200,000 tokens
- Input: $15.00 per 1M
- Output: $75.00 per 1M
- Token Ratio: 1.1x
Claude 3 Sonnet
Specifications:
- Context: 200,000 tokens
- Input: $3.00 per 1M
- Output: $15.00 per 1M
- Token Ratio: 1.1x
Claude 3 Haiku
Specifications:
- Context: 200,000 tokens
- Input: $0.25 per 1M
- Output: $1.25 per 1M
- Token Ratio: 1.1x
Google Models
Gemini 1.5 Series
- Gemini 1.5 Pro
- Gemini 1.5 Flash
- Processing entire codebases
- Long document analysis
- Multi-document reasoning
Meta Models
Llama 3.1 Series (Latest)
Llama 3.1 405B - Flagship Model
Llama 3.1 405B - Flagship Model
- Largest open-source model
- Competitive with GPT-4
- Token-efficient (5% fewer tokens)
Llama 3.1 70B - Sweet Spot
Llama 3.1 70B - Sweet Spot
- Best balance of cost and capability
- 87% cheaper than GPT-4o
- 131K context window
Llama 3.1 8B - Ultra Efficient
Llama 3.1 8B - Ultra Efficient
- Lowest cost option
- Surprisingly capable
- Same 131K context as larger variants
Llama 3 Series (Previous Generation)
| Model | Context | Input Cost | Output Cost | Best For |
|---|---|---|---|---|
| Llama 3 70B | 8,192 | $0.70 | $0.80 | Legacy applications |
| Llama 3 8B | 8,192 | $0.05 | $0.05 | Budget workloads |
Llama 3.1 models offer significantly larger context (131K vs 8K) at similar or better pricing. Upgrade if possible.
Mistral AI Models
Mistral Large
128K Context | 6.00Premier model from Mistral AI
- European AI provider
- Strong multilingual support
- Token ratio: 1.02x
Mistral Nemo
128K Context | 0.15Fast and affordable
- Same pricing for input/output
- Large context window
- Token ratio: 1.02x
Mixtral 8x7B
32K Context | 0.24Mixture of Experts architecture
- Efficient sparse activation
- Good for diverse tasks
- Token ratio: 1.02x
Mixtral 8x22B
65K Context | 0.65Larger MoE model
- More parameters
- Better performance
- Token ratio: 1.02x
Cohere Models
- Command R+
- Command R
- Context: 128,000 tokens
- Input: $2.50 per 1M
- Output: $10.00 per 1M
- Optimized for RAG (Retrieval Augmented Generation)
Specialized Providers
Alibaba (Qwen Models)
Qwen 2.5 & Qwen 2 Series
Qwen 2.5 & Qwen 2 Series
| Model | Context | Input | Output | Notes |
|---|---|---|---|---|
| Qwen2.5 72B | 131,072 | $0.35 | $0.40 | Latest version |
| Qwen2 72B | 131,072 | $0.35 | $0.40 | Stable release |
- Strong multilingual (especially Chinese)
- Token efficient
- Competitive pricing
DeepSeek
DeepSeek V2.5 & V2
DeepSeek V2.5 & V2
- Context: 131,072 tokens
- Input: $0.14 per 1M tokens
- Output: $0.28 per 1M tokens
- Excellent value proposition
- Chinese AI research lab
01.AI (Yi Models)
Yi Large & Yi 1.5 34B
Yi Large & Yi 1.5 34B
| Model | Context | Input | Output | Token Ratio |
|---|---|---|---|---|
| Yi Large | 32,768 | $0.60 | $0.60 | 0.97 |
| Yi 1.5 34B | 32,768 | $0.30 | $0.30 | 0.97 |
- Founded by Kai-Fu Lee
- Competitive performance
- Mid-tier pricing
Microsoft (Phi Models)
Phi-3.5 & Phi-3 Series
Phi-3.5 & Phi-3 Series
Small but capable models optimized for efficiency:
| Model | Context | Input | Output |
|---|---|---|---|
| Phi-3.5 Mini | 131,072 | $0.15 | $0.60 |
| Phi-3 Medium | 131,072 | $1.00 | $1.00 |
| Phi-3 Mini | 131,072 | $0.15 | $0.60 |
- Small model size
- Large context window
- Good for edge deployment
AI21 Labs (Jamba Models)
Jamba 1.5 Large & Mini
Jamba 1.5 Large & Mini
Hybrid SSM-Transformer architecture:
Standout Feature: 256K token context window at competitive pricing!
| Model | Context | Input | Output |
|---|---|---|---|
| Jamba 1.5 Large | 262,144 | $0.50 | $0.70 |
| Jamba 1.5 Mini | 262,144 | $0.10 | $0.10 |
xAI (Grok Models)
Grok-2 & Grok-2 Mini
Grok-2 & Grok-2 Mini
From Elon Musk’s xAI:
| Model | Context | Input | Output |
|---|---|---|---|
| Grok-2 | 131,072 | $2.00 | $10.00 |
| Grok-2 Mini | 131,072 | $0.15 | $0.60 |
Other Providers
Reka (Reka Core & Flash)
Reka (Reka Core & Flash)
| Model | Context | Input | Output | Token Ratio |
|---|---|---|---|---|
| Reka Core | 131,072 | $10.00 | $25.00 | 0.99 |
| Reka Flash | 131,072 | $0.15 | $0.60 | 0.99 |
Amazon (Titan Text)
Amazon (Titan Text)
| Model | Context | Input | Output | Token Ratio |
|---|---|---|---|---|
| Titan Text Premier | 32,000 | $0.50 | $1.50 | 1.04 |
| Titan Text Express | 8,000 | $0.13 | $0.17 | 1.04 |
Perplexity (Llama Sonar)
Perplexity (Llama Sonar)
| Model | Context | Input | Output | Token Ratio |
|---|---|---|---|---|
| Llama 3.1 Sonar Large | 131,072 | $1.00 | $1.00 | 0.95 |
| Llama 3.1 Sonar Small | 131,072 | $0.20 | $0.20 | 0.95 |
IBM (Granite)
IBM (Granite)
| Model | Context | Input | Output | Token Ratio |
|---|---|---|---|---|
| Granite 3 8B | 131,072 | $0.055 | $0.055 | 0.96 |
| Granite 3 2B | 131,072 | $0.025 | $0.025 | 0.96 |
Nous Research (Hermes)
Nous Research (Hermes)
| Model | Context | Input | Output | Token Ratio |
|---|---|---|---|---|
| Hermes 3 405B | 131,072 | $2.70 | $2.70 | 0.95 |
| Hermes 3 70B | 131,072 | $0.35 | $0.40 | 0.95 |
Snowflake (Arctic)
Snowflake (Arctic)
- Context: 4,096 tokens
- Input: $0.24 per 1M
- Output: $0.24 per 1M
- Token Ratio: 1.06
NVIDIA (Nemotron)
NVIDIA (Nemotron)
| Model | Context | Input | Output | Token Ratio |
|---|---|---|---|---|
| Nemotron 70B | 131,072 | $0.35 | $0.40 | 0.98 |
| Nemotron Mini | 131,072 | $0.15 | $0.60 | 0.98 |
Model Selection Guide
- By Use Case
- By Token Efficiency
- By Provider
High Accuracy Tasks:
- GPT-4o, Claude 3.5 Sonnet, Claude 3 Opus
- Gemini 1.5 Pro, Llama 3.1 405B
- GPT-4o Mini, Gemini 1.5 Flash
- Llama 3.1 8B, DeepSeek V2.5
- Granite 3 2B (lowest cost)
- Gemini 1.5 Pro (2M tokens)
- Jamba 1.5 (256K tokens)
- Claude 3 family (200K tokens)
- GPT-3.5 Turbo, Claude 3 Haiku
- Mistral Nemo, Gemini 1.5 Flash
Pricing Comparison
Budget Options (< $0.20 per 1M input tokens)
Premium Options (> $5.00 per 1M input tokens)
Token Ratio Reference
Token ratio indicates how many tokens a model uses compared to GPT (baseline 1.0). Lower is more efficient.
Model Data Structure
All model data comes frommodels-config.js:
External Resources
Artificial Analysis
Independent benchmarks and detailed model comparisons
OpenAI Tokenizer
Official OpenAI tokenization playground
Tiktoken Library
Open source tokenization library used by this tool
Model Pricing Updates
Track pricing changes across providers
Next Steps
How to Use
Learn how to analyze tokens with Tokenizador
Understanding Tokenization
Deep dive into tokenization concepts