Installation
Setup
Set your Groq API key as an environment variable:Usage
Streaming
API Reference
ChatGroq
Name of Groq model to use (e.g.,
llama-3.3-70b-versatile, mixtral-8x7b-32768).Sampling temperature between 0.0 and 1.0. Lower values make output more focused and deterministic.
Maximum number of tokens to generate.
Format for reasoning output (for supported models):
parsed: Separates reasoning intoadditional_kwargs.reasoning_contentraw: Includes reasoning within think tagshidden: Returns only final answer (model still performs reasoning)
Timeout for requests in seconds.
Maximum number of retries for failed requests.
Groq API key. If not provided, reads from
GROQ_API_KEY environment variable.Base URL for API requests. Leave blank unless using a proxy or service emulator.
Additional model parameters valid for the create call not explicitly specified.
Supported Models
- Llama 3.3 70B: Meta’s latest model with strong performance
- Llama 3.1 series: 8B, 70B, and 405B variants
- Mixtral 8x7B: Mixture-of-experts model
- Gemma 2 9B: Google’s efficient model
- DeepSeek-R1: Reasoning model with extended thinking
Features
- Ultra-fast inference with Groq’s LPU technology
- Function/tool calling
- Vision support (select models)
- JSON mode
- Streaming
- Async support
- Reasoning mode for compatible models
Groq is known for extremely fast inference speeds, often 10x faster than traditional GPU inference. This makes it ideal for interactive applications and high-throughput workloads.