Skip to main content
Groq provides free API access to various open-source models with extremely fast inference speeds powered by their LPU (Language Processing Unit) technology.

Overview

Groq offers free access to multiple language models optimized for their custom LPU hardware, delivering industry-leading inference speeds.

Rate Limits

Each model has specific rate limits:
Model NameRequests/DayTokens/Minute
Allam 2 7B7,0006,000
Llama 3.1 8B14,4006,000
Llama 3.3 70B1,00012,000
Llama 4 Maverick 17B 128E Instruct1,0006,000
Llama 4 Scout Instruct1,00030,000
Whisper Large v32,0007,200 audio-seconds/min
Whisper Large v3 Turbo2,0007,200 audio-seconds/min
canopylabs/orpheus-arabic-saudi--
canopylabs/orpheus-v1-english--
groq/compound25070,000
groq/compound-mini25070,000
meta-llama/llama-guard-4-12b14,40015,000
meta-llama/llama-prompt-guard-2-22m--
meta-llama/llama-prompt-guard-2-86m--
moonshotai/kimi-k2-instruct1,00010,000
moonshotai/kimi-k2-instruct-09051,00010,000
openai/gpt-oss-120b1,0008,000
openai/gpt-oss-20b1,0008,000
openai/gpt-oss-safeguard-20b1,0008,000
qwen/qwen3-32b1,0006,000

Available Models

Text Generation

Llama 4 Scout

Latest Llama model with 30K tokens/min

Llama 3.3 70B

Powerful 70B parameter model

Groq Compound

Groq’s proprietary model with 70K tokens/min

Qwen 3 32B

Efficient multilingual model

Speech Recognition

  • Whisper Large v3 - High-accuracy speech recognition
  • Whisper Large v3 Turbo - Faster speech recognition

Safety Models

  • Llama Guard 4 12B - Content moderation
  • Llama Prompt Guard - Prompt injection detection

API Usage

import openai

client = openai.OpenAI(
    base_url="https://api.groq.com/openai/v1",
    api_key="YOUR_GROQ_API_KEY"
)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(response.choices[0].message.content)

Getting Started

1

Create Account

Sign up at console.groq.com
2

Generate API Key

Create an API key from your dashboard
3

Start Building

Use the OpenAI-compatible API for inference

Key Features

  • Fastest inference speeds in the industry
  • OpenAI-compatible API
  • Multiple model options including Llama, Mistral, and Groq models
  • Audio transcription with Whisper
  • Content moderation models
  • Generous free tier limits

Performance Highlights

Ultra-Fast

Industry-leading inference speeds

High Throughput

Up to 70,000 tokens/minute

Low Latency

Millisecond response times

LPU Technology

Custom hardware for LLM inference

Multiple Models

Wide selection of open models

Audio Support

Whisper for speech recognition

Additional Resources

Groq Console

Access the platform

Documentation

API documentation

Build docs developers (and LLMs) love