Groq provides free API access to various open-source models with extremely fast inference speeds powered by their LPU (Language Processing Unit) technology.
Overview
Groq offers free access to multiple language models optimized for their custom LPU hardware, delivering industry-leading inference speeds.Rate Limits
Each model has specific rate limits:| Model Name | Requests/Day | Tokens/Minute |
|---|---|---|
| Allam 2 7B | 7,000 | 6,000 |
| Llama 3.1 8B | 14,400 | 6,000 |
| Llama 3.3 70B | 1,000 | 12,000 |
| Llama 4 Maverick 17B 128E Instruct | 1,000 | 6,000 |
| Llama 4 Scout Instruct | 1,000 | 30,000 |
| Whisper Large v3 | 2,000 | 7,200 audio-seconds/min |
| Whisper Large v3 Turbo | 2,000 | 7,200 audio-seconds/min |
| canopylabs/orpheus-arabic-saudi | - | - |
| canopylabs/orpheus-v1-english | - | - |
| groq/compound | 250 | 70,000 |
| groq/compound-mini | 250 | 70,000 |
| meta-llama/llama-guard-4-12b | 14,400 | 15,000 |
| meta-llama/llama-prompt-guard-2-22m | - | - |
| meta-llama/llama-prompt-guard-2-86m | - | - |
| moonshotai/kimi-k2-instruct | 1,000 | 10,000 |
| moonshotai/kimi-k2-instruct-0905 | 1,000 | 10,000 |
| openai/gpt-oss-120b | 1,000 | 8,000 |
| openai/gpt-oss-20b | 1,000 | 8,000 |
| openai/gpt-oss-safeguard-20b | 1,000 | 8,000 |
| qwen/qwen3-32b | 1,000 | 6,000 |
Available Models
Text Generation
Llama 4 Scout
Latest Llama model with 30K tokens/min
Llama 3.3 70B
Powerful 70B parameter model
Groq Compound
Groq’s proprietary model with 70K tokens/min
Qwen 3 32B
Efficient multilingual model
Speech Recognition
- Whisper Large v3 - High-accuracy speech recognition
- Whisper Large v3 Turbo - Faster speech recognition
Safety Models
- Llama Guard 4 12B - Content moderation
- Llama Prompt Guard - Prompt injection detection
API Usage
Getting Started
Create Account
Sign up at console.groq.com
Key Features
- Fastest inference speeds in the industry
- OpenAI-compatible API
- Multiple model options including Llama, Mistral, and Groq models
- Audio transcription with Whisper
- Content moderation models
- Generous free tier limits
Performance Highlights
Ultra-Fast
Industry-leading inference speeds
High Throughput
Up to 70,000 tokens/minute
Low Latency
Millisecond response times
LPU Technology
Custom hardware for LLM inference
Multiple Models
Wide selection of open models
Audio Support
Whisper for speech recognition
Additional Resources
Groq Console
Access the platform
Documentation
API documentation
