Groq - Free LLM API Resources

Learn more about Mintlify

Enter your email to receive updates about new features and product releases.

Overview
Rate Limits
Available Models
Text Generation
Speech Recognition
Safety Models
API Usage
Getting Started
Key Features
Performance Highlights
Additional Resources

Groq provides free API access to various open-source models with extremely fast inference speeds powered by their LPU (Language Processing Unit) technology.

Overview

Groq offers free access to multiple language models optimized for their custom LPU hardware, delivering industry-leading inference speeds.

Rate Limits

Each model has specific rate limits:

Model Name	Requests/Day	Tokens/Minute
Allam 2 7B	7,000	6,000
Llama 3.1 8B	14,400	6,000
Llama 3.3 70B	1,000	12,000
Llama 4 Maverick 17B 128E Instruct	1,000	6,000
Llama 4 Scout Instruct	1,000	30,000
Whisper Large v3	2,000	7,200 audio-seconds/min
Whisper Large v3 Turbo	2,000	7,200 audio-seconds/min
canopylabs/orpheus-arabic-saudi	-	-
canopylabs/orpheus-v1-english	-	-
groq/compound	250	70,000
groq/compound-mini	250	70,000
meta-llama/llama-guard-4-12b	14,400	15,000
meta-llama/llama-prompt-guard-2-22m	-	-
meta-llama/llama-prompt-guard-2-86m	-	-
moonshotai/kimi-k2-instruct	1,000	10,000
moonshotai/kimi-k2-instruct-0905	1,000	10,000
openai/gpt-oss-120b	1,000	8,000
openai/gpt-oss-20b	1,000	8,000
openai/gpt-oss-safeguard-20b	1,000	8,000
qwen/qwen3-32b	1,000	6,000

Available Models

Text Generation

Llama 4 Scout

Latest Llama model with 30K tokens/min

Llama 3.3 70B

Powerful 70B parameter model

Groq Compound

Groq’s proprietary model with 70K tokens/min

Qwen 3 32B

Efficient multilingual model

Speech Recognition

Whisper Large v3 - High-accuracy speech recognition
Whisper Large v3 Turbo - Faster speech recognition

Safety Models

Llama Guard 4 12B - Content moderation
Llama Prompt Guard - Prompt injection detection

API Usage

import openai

client = openai.OpenAI(
    base_url="https://api.groq.com/openai/v1",
    api_key="YOUR_GROQ_API_KEY"
)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(response.choices[0].message.content)

Getting Started

Create Account

Generate API Key

Create an API key from your dashboard

Start Building

Use the OpenAI-compatible API for inference

Key Features

Fastest inference speeds in the industry
OpenAI-compatible API
Multiple model options including Llama, Mistral, and Groq models
Audio transcription with Whisper
Content moderation models
Generous free tier limits

Performance Highlights

Ultra-Fast

Industry-leading inference speeds

High Throughput

Up to 70,000 tokens/minute

Low Latency

Millisecond response times

LPU Technology

Custom hardware for LLM inference

Multiple Models

Wide selection of open models

Audio Support

Whisper for speech recognition

Additional Resources

Groq Console

Access the platform

Documentation

API documentation

Cerebras Cohere

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

Always Free

​Overview

​Rate Limits

​Available Models

​Text Generation

Llama 4 Scout

Llama 3.3 70B

Groq Compound

Qwen 3 32B

​Speech Recognition

​Safety Models

​API Usage

​Getting Started

​Key Features

​Performance Highlights

Ultra-Fast

High Throughput

Low Latency

LPU Technology

Multiple Models

Audio Support

​Additional Resources

Groq Console

Documentation

Build docs developers (and LLMs) love

Overview

Rate Limits

Available Models

Text Generation

Speech Recognition

Safety Models

API Usage

Getting Started

Key Features

Performance Highlights

Additional Resources