Cerebras provides free API access to various open-source models optimized for their specialized AI hardware.
Overview
Cerebras offers free access to multiple open-source models running on their custom AI accelerators, providing extremely fast inference speeds.Rate Limits
Each model has specific rate limits:| Model Name | Requests/Min | Tokens/Min | Requests/Hour | Tokens/Hour | Requests/Day | Tokens/Day |
|---|---|---|---|---|---|---|
| gpt-oss-120b | 30 | 60,000 | 900 | 1,000,000 | 14,400 | 1,000,000 |
| Qwen 3 235B A22B Instruct | 30 | 60,000 | 900 | 1,000,000 | 14,400 | 1,000,000 |
| Llama 3.3 70B | 30 | 64,000 | 900 | 1,000,000 | 14,400 | 1,000,000 |
| Qwen 3 32B | 30 | 64,000 | 900 | 1,000,000 | 14,400 | 1,000,000 |
| Llama 3.1 8B | 30 | 60,000 | 900 | 1,000,000 | 14,400 | 1,000,000 |
| Z.ai GLM-4.6 | 10 | 60,000 | 100 | 100,000 | 100 | 1,000,000 |
Available Models
gpt-oss-120b
120B parameter open-source model
Qwen 3 235B A22B
Qwen’s largest instruction-tuned model
Llama 3.3 70B
Meta’s latest 70B model
Qwen 3 32B
Efficient 32B parameter model
Llama 3.1 8B
Fast 8B parameter model
Z.ai GLM-4.6
GLM-4 generation model
API Usage
Getting Started
Create Account
Sign up at cloud.cerebras.ai
Key Features
- Ultra-fast inference powered by Cerebras hardware
- OpenAI-compatible API
- Generous rate limits on free tier
- Access to large models (up to 235B parameters)
- High token throughput
Performance
Fast Inference
Specialized hardware for ultra-fast generation
Large Models
Support for models up to 235B parameters
High Throughput
Up to 64,000 tokens per minute
Consistent Speed
Low latency across all model sizes
Additional Resources
Cerebras Cloud
Access the platform
Documentation
API documentation
