Groq

This integration connects Groq’s ultra-fast inference API to LangChain.

Installation

pip install -U langchain-groq

Setup

Set your Groq API key as an environment variable:

export GROQ_API_KEY="your-api-key"

Get your API key from console.groq.com.

Usage

from langchain_groq import ChatGroq

model = ChatGroq(
    model="llama-3.3-70b-versatile",
    temperature=0,
    max_retries=2,
)

messages = [
    ("system", "You are a helpful assistant."),
    ("human", "What is the capital of France?"),
]

response = model.invoke(messages)
print(response.content)

Streaming

for chunk in model.stream(messages):
    print(chunk.content, end="")

API Reference

ChatGroq

model

str

required

Name of Groq model to use (e.g., llama-3.3-70b-versatile, mixtral-8x7b-32768).

temperature

float

default:"1"

Sampling temperature between 0.0 and 1.0. Lower values make output more focused and deterministic.

max_tokens

int | None

default:"None"

Maximum number of tokens to generate.

reasoning_format

'parsed' | 'raw' | 'hidden' | None

default:"None"

Format for reasoning output (for supported models):

parsed: Separates reasoning into additional_kwargs.reasoning_content
raw: Includes reasoning within think tags
hidden: Returns only final answer (model still performs reasoning)

timeout

float | None

default:"None"

Timeout for requests in seconds.

max_retries

int

default:"2"

Maximum number of retries for failed requests.

api_key

str | None

default:"None"

Groq API key. If not provided, reads from GROQ_API_KEY environment variable.

base_url

str | None

default:"None"

Base URL for API requests. Leave blank unless using a proxy or service emulator.

model_kwargs

dict

default:"{}"

Additional model parameters valid for the create call not explicitly specified.

Supported Models

Llama 3.3 70B: Meta’s latest model with strong performance
Llama 3.1 series: 8B, 70B, and 405B variants
Mixtral 8x7B: Mixture-of-experts model
Gemma 2 9B: Google’s efficient model
DeepSeek-R1: Reasoning model with extended thinking

See console.groq.com/docs/models for the latest model availability.

Features

Ultra-fast inference with Groq’s LPU technology
Function/tool calling
Vision support (select models)
JSON mode
Streaming
Async support
Reasoning mode for compatible models

Groq is known for extremely fast inference speeds, often 10x faster than traditional GPU inference. This makes it ideal for interactive applications and high-throughput workloads.

Overview

Chat Models

Embeddings

Vector Stores

Tools

Installation

Setup

Usage

Streaming

API Reference

ChatGroq

Supported Models

Features

Build docs developers (and LLMs) love

Overview

Chat Models

Embeddings

Vector Stores

Tools

​Installation

​Setup

​Usage

​Streaming

​API Reference

​ChatGroq

​Supported Models

​Features

Build docs developers (and LLMs) love

Installation

Setup

Usage

Streaming

API Reference

ChatGroq

Supported Models

Features