Overview
- Type: Cloud provider
- Cost: Free tier available, pay-per-use for higher usage (see pricing)
- API Key Required: Yes
- Installation Required: No
- Official Website: https://groq.com/
Prerequisites
Create a Groq account
Sign up at console.groq.com using your email or GitHub account.
Generate an API key
Navigate to API Keys and create a new API key. Copy it immediately as you won’t be able to see it again.
Groq offers a generous free tier suitable for development and moderate personal use.
Setup in AI Providers
Select Groq provider
In the AI Providers settings, click Create AI provider and select
Groq as the provider type.Enter API key
Paste your API key from the API Keys page into the
API key field.Select model
Click the refresh button to fetch available models, then select your preferred model (e.g.,
llama3-70b-8192).Recommended Models
| Model | Context Window | Description | Best For |
|---|---|---|---|
llama-3.3-70b-versatile | 128K tokens | Latest Llama 3.3, excellent quality | Most tasks, best balance |
llama-3.1-70b-versatile | 128K tokens | High-quality general purpose | Complex reasoning |
llama-3.1-8b-instant | 128K tokens | Ultra-fast, smaller model | Quick responses |
mixtral-8x7b-32768 | 32K tokens | Mixture of experts model | Diverse tasks |
gemma2-9b-it | 8K tokens | Google’s Gemma model | Efficient performance |
All models on Groq run exceptionally fast thanks to their custom LPU hardware, often delivering responses 10-20x faster than standard GPU inference.
Key Features
Ultra-Fast Inference
Groq’s LPU technology provides:- Tokens per second: 500-1000+ tokens/second
- Low latency: Near-instant response start
- Consistent speed: Predictable performance
- Real-time feel: Responses feel instantaneous
OpenAI-Compatible API
Groq uses an OpenAI-compatible API, making it:- Easy to integrate
- Familiar for developers
- Simple to switch from OpenAI
- Compatible with OpenAI-based tools
Free Tier
Groq’s free tier includes:- 14,400 requests per day
- 7,000 requests per minute
- Suitable for development and personal use
Supported Models
Groq specializes in:- Open-source models (Llama, Mixtral, Gemma)
- Optimized for their LPU hardware
- Regular model updates
- Multiple size options
Troubleshooting
Rate Limits
If you hit rate limits:API Key Issues
If your API key isn’t working:- Verify you copied the entire key from the console
- Check that the key hasn’t been revoked
- Ensure you’re using the correct endpoint URL
Model Not Available
If a model doesn’t appear:- Click the refresh button in AI Providers settings
- Check the Groq documentation for current model availability
- Some models may be temporarily unavailable during maintenance
Context Length Errors
If you exceed the context limit:- Most Llama 3.1 models support up to 128K tokens
- Older models have 8K-32K limits
- Break large inputs into smaller chunks if needed
Pricing Considerations
Free Tier:- Excellent for development
- Suitable for personal projects
- Good rate limits for moderate use
- Competitive pricing per token
- Higher rate limits
- Priority access
- Better for production use
- Use smaller models (8B) for simple tasks
- Use larger models (70B) for complex reasoning
- Monitor usage in the console
- Stay within free tier limits when possible
Advanced Configuration
Model Parameters
Customize model behavior:temperature- Control randomness (0.0-2.0)max_tokens- Maximum response lengthtop_p- Nucleus sampling parameterstop- Stop sequencesfrequency_penalty- Reduce repetitionpresence_penalty- Encourage topic diversity
Streaming
Groq excels at streaming responses:- Ultra-low latency
- Smooth token delivery
- Real-time user experience
- Enabled by default in AI Providers
Response Format
Control output format:- JSON mode for structured output
- Standard text responses
- Custom stop sequences
Best Practices
- Leverage the speed: Design UX that takes advantage of fast responses
- Use appropriate models: 8B for speed, 70B for quality
- Monitor rate limits: Check the console for usage stats
- Enable streaming: Get the full benefit of Groq’s speed
- Stay in free tier: Great for development and personal use
Use Cases
Perfect for:- Real-time chat applications
- Quick document analysis
- Rapid iteration during development
- Interactive AI experiences
- High-volume simple tasks
- Tasks requiring the absolute latest models
- Specialized proprietary models
- Extreme context lengths (>128K tokens)
Performance Comparison
Groq typically delivers:- 10-20x faster than standard GPU inference
- 3-5x faster than other optimized cloud providers
- 500+ tokens/second for most models
- Less than 100ms time to first token
Groq’s speed advantage is most noticeable with longer responses. Short prompts benefit less dramatically but still see excellent performance.
Advantages of Groq
- Speed: Industry-leading inference speed
- Free tier: Generous limits for development
- Low latency: Near-instant response start
- OpenAI compatible: Easy integration
- Reliability: Consistent, predictable performance
- Open models: Access to latest open-source models
If speed is a priority and you’re okay with open-source models (not GPT-4 or Claude), Groq is an excellent choice.