Groq

Overview

Groq provides ultra-fast inference for open-source LLMs like Llama, Mixtral, and Gemma with speeds up to 500+ tokens/second.

Installation

npm install @llamaindex/groq

Basic Usage

import { Groq } from "@llamaindex/groq";

const llm = new Groq({
  model: "llama-3.1-70b-versatile",
  apiKey: process.env.GROQ_API_KEY
});

const response = await llm.chat({
  messages: [
    { role: "user", content: "Explain quantum computing" }
  ]
});

console.log(response.message.content);

Constructor Options

model

string

required

Groq model name

apiKey

string

Groq API key (defaults to GROQ_API_KEY env variable)

temperature

number

Sampling temperature

maxTokens

number

Maximum tokens in response

topP

number

default:1

Nucleus sampling parameter

Supported Models

Llama 3.1

llama-3.1-405b-reasoning: Most capable
llama-3.1-70b-versatile: Balanced performance
llama-3.1-8b-instant: Fastest

Llama 3

llama3-70b-8192: 70B parameter model
llama3-8b-8192: 8B parameter model

Mixtral

mixtral-8x7b-32768: Mixtral MoE model

Gemma

gemma-7b-it: Google Gemma 7B
gemma2-9b-it: Gemma 2 9B

Streaming

const stream = await llm.chat({
  messages: [{ role: "user", content: "Write a story" }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.delta);
}

Function Calling

import { tool } from "@llamaindex/core/tools";
import { z } from "zod";

const weatherTool = tool({
  name: "get_weather",
  description: "Get weather for a location",
  parameters: z.object({
    location: z.string()
  }),
  execute: async ({ location }) => {
    return `Weather in ${location}: 72°F`;
  }
});

const response = await llm.chat({
  messages: [{ role: "user", content: "Weather in NYC?" }],
  tools: [weatherTool]
});

Structured Output

import { z } from "zod";

const schema = z.object({
  summary: z.string(),
  sentiment: z.enum(["positive", "negative", "neutral"]),
  topics: z.array(z.string())
});

const result = await llm.exec({
  messages: [{ role: "user", content: "Analyze: Great product, fast shipping!" }],
  responseFormat: schema
});

Configuration

Environment Variables

GROQ_API_KEY=gsk_...

Global Settings

import { Settings } from "llamaindex";
import { Groq } from "@llamaindex/groq";

Settings.llm = new Groq({
  model: "llama-3.1-70b-versatile"
});

Performance

Groq’s LPU (Language Processing Unit) delivers exceptional speed:

const startTime = Date.now();

const response = await llm.chat({
  messages: [{ role: "user", content: "Explain AI" }]
});

const duration = Date.now() - startTime;
console.log(`Response time: ${duration}ms`);
console.log(`Tokens/sec: ${response.raw.usage.completion_tokens / (duration / 1000)}`);

Typical speeds: 300-500 tokens/second

With LlamaIndex

import { Settings, VectorStoreIndex } from "llamaindex";
import { Groq } from "@llamaindex/groq";

Settings.llm = new Groq({ model: "llama-3.1-70b-versatile" });

const index = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = index.asQueryEngine();

const response = await queryEngine.query({
  query: "What is the main topic?"
});

Model Selection Guide

Use Case	Recommended Model	Why
Complex reasoning	llama-3.1-405b-reasoning	Best quality
General purpose	llama-3.1-70b-versatile	Balanced
Speed critical	llama-3.1-8b-instant	Fastest
Long context	mixtral-8x7b-32768	32K context

Rate Limits

Groq has generous free tier limits:

Free: 30 requests/minute
Paid: Higher limits based on plan

Handle rate limits:

try {
  const response = await llm.chat({ messages });
} catch (error) {
  if (error.status === 429) {
    console.log("Rate limit hit, waiting...");
    await new Promise(resolve => setTimeout(resolve, 2000));
    // Retry
  }
}

Best Practices

Use for production: Groq’s speed excellent for real-time applications
Choose right model: Balance speed vs capability
Monitor usage: Track API calls and costs
Stream responses: Even better UX with Groq’s speed
Handle rate limits: Implement retry logic

Core Package

Main Package

LLM Providers

Vector Stores

Workflow & Tools

Overview

Installation

Basic Usage

Constructor Options

Supported Models

Llama 3.1

Llama 3

Mixtral

Gemma

Streaming

Function Calling

Structured Output

Configuration

Environment Variables

Global Settings

Performance

With LlamaIndex

Model Selection Guide

Rate Limits

Best Practices

See Also

Build docs developers (and LLMs) love

Core Package

Main Package

LLM Providers

Vector Stores

Workflow & Tools

​Overview

​Installation

​Basic Usage

​Constructor Options

​Supported Models

​Llama 3.1

​Llama 3

​Mixtral

​Gemma

​Streaming

​Function Calling

​Structured Output

​Configuration

​Environment Variables

​Global Settings

​Performance

​With LlamaIndex

​Model Selection Guide

​Rate Limits

​Best Practices

​See Also

Build docs developers (and LLMs) love

Overview

Installation

Basic Usage

Constructor Options

Supported Models

Llama 3.1

Llama 3

Mixtral

Gemma

Streaming

Function Calling

Structured Output

Configuration

Environment Variables

Global Settings

Performance

With LlamaIndex

Model Selection Guide

Rate Limits

Best Practices

See Also