Groq Configuration

Overview

Groq provides lightning-fast LLM inference using specialized hardware, delivering:

Ultra-Fast: 10x faster inference than traditional providers
Cost-Effective: Competitive pricing
Open Models: Access to Llama, Mixtral, and Gemma
High Throughput: Process more requests per second
Low Latency: ~100ms response times

Perfect for high-volume scraping and real-time applications.

Prerequisites

Get API Key

Sign up at console.groq.com
Navigate to API Keys
Click “Create API Key”
Copy your API key

Install ScrapeGraphAI

pip install scrapegraphai
playwright install

Set Environment Variable

export GROQ_API_KEY="gsk_..."

Or create a .env file:

GROQ_API_KEY=gsk_...

Basic Configuration

import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph

load_dotenv()

graph_config = {
    "llm": {
        "api_key": os.getenv("GROQ_API_KEY"),
        "model": "groq/llama-3.3-70b-versatile",
        "temperature": 0,
    },
    "verbose": True,
    "headless": False,
}

smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the projects with their description",
    source="https://perinim.github.io/projects/",
    config=graph_config,
)

result = smart_scraper_graph.run()
print(result)

This example is based on: examples/extras/undected_playwright.py and examples/extras/cond_smartscraper_usage.py

Available Models

Recommended
All Models
Comparison

Llama 3.3 70B (Best Quality)

graph_config = {
    "llm": {
        "api_key": os.getenv("GROQ_API_KEY"),
        "model": "groq/llama-3.3-70b-versatile",
        "temperature": 0,
    },
}

Context: 128K tokens
Speed: Very fast
Best for: High accuracy scraping

Llama 3.1 8B Instant (Fastest)

graph_config = {
    "llm": {
        "api_key": os.getenv("GROQ_API_KEY"),
        "model": "groq/llama-3.1-8b-instant",
        "temperature": 0,
    },
}

Context: 128K tokens
Speed: Ultra fast
Best for: Speed-critical applications

Complete list of Groq models:

Model	Context	Speed	Best For
`llama-3.3-70b-versatile`	128K	Very Fast	Best quality
`llama-3.1-8b-instant`	128K	Ultra Fast	Speed priority
`llama3-70b-8192`	8K	Fast	Legacy
`llama3-8b-8192`	8K	Very Fast	Legacy
`mixtral-8x7b-32768`	32K	Fast	Good balance
`gemma2-9b-it`	8K	Very Fast	Google model
`gemma-7b-it`	8K	Very Fast	Lightweight

Use llama-3.3-70b-versatile or llama-3.1-8b-instant for best results.

Speed comparison (approximate):

Model                     Tokens/sec
─────────────────────────────────────
llama-3.1-8b-instant      ~800
llama-3.3-70b-versatile   ~300
gemma2-9b-it              ~700
mixtral-8x7b-32768        ~500

All Groq models are significantly faster than traditional cloud providers.

Configuration Options

Temperature

Control output randomness:

graph_config = {
    "llm": {
        "api_key": os.getenv("GROQ_API_KEY"),
        "model": "groq/llama-3.3-70b-versatile",
        "temperature": 0,  # Deterministic (recommended for scraping)
    },
}

0: Deterministic, consistent
0.5: Balanced
1.0: Creative, varied

Always use temperature: 0 for web scraping to ensure consistent results.

Max Tokens

Limit response length:

graph_config = {
    "llm": {
        "api_key": os.getenv("GROQ_API_KEY"),
        "model": "groq/llama-3.3-70b-versatile",
        "max_tokens": 4000,
    },
}

Top P (Nucleus Sampling)

Control diversity:

graph_config = {
    "llm": {
        "api_key": os.getenv("GROQ_API_KEY"),
        "model": "groq/llama-3.3-70b-versatile",
        "temperature": 0,
        "top_p": 1.0,  # Default
    },
}

Complete Examples

import os
import json
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info

load_dotenv()

graph_config = {
    "llm": {
        "api_key": os.getenv("GROQ_API_KEY"),
        "model": "groq/llama-3.3-70b-versatile",
        "temperature": 0,
    },
    "verbose": True,
    "headless": True,
}

smart_scraper = SmartScraperGraph(
    prompt="Extract all article titles and summaries",
    source="https://www.wired.com",
    config=graph_config,
)

result = smart_scraper.run()
print(json.dumps(result, indent=4))

graph_exec_info = smart_scraper.get_execution_info()
print(prettify_exec_info(graph_exec_info))

Performance Optimization

Use Fastest Model

For maximum throughput:

"model": "groq/llama-3.1-8b-instant"  # Ultra fast

Enable Headless Mode

Browser runs faster in background:

"headless": True  # 20-30% faster

Parallel Processing

Process multiple URLs concurrently:

from concurrent.futures import ThreadPoolExecutor

def scrape_url(url):
    scraper = SmartScraperGraph(
        prompt="Extract data",
        source=url,
        config=graph_config,
    )
    return scraper.run()

with ThreadPoolExecutor(max_workers=5) as executor:
    results = list(executor.map(scrape_url, urls))

Reduce Max Tokens

Limit output for faster responses:

"max_tokens": 2000  # Smaller = faster

Rate Limits

Groq has generous rate limits:

Free Tier: 30 requests/minute
Paid Tier: Higher limits available

Groq’s high inference speed means you can process more data even with rate limits.

Implement rate limiting:

import time
from ratelimit import limits, sleep_and_retry

@sleep_and_retry
@limits(calls=30, period=60)  # 30 requests per minute
def scrape_with_rate_limit(url):
    scraper = SmartScraperGraph(
        prompt="Extract data",
        source=url,
        config=graph_config,
    )
    return scraper.run()

Troubleshooting

API Key Invalid

Error: AuthenticationError: Invalid API keySolution:

Verify API key at console.groq.com
Ensure it starts with gsk_
Check environment variable:

echo $GROQ_API_KEY

Rate Limit Exceeded

Error: 429 Rate limit exceededSolution: Implement rate limiting:

import time

for url in urls:
    result = scraper.run()
    time.sleep(2)  # Wait 2 seconds between requests

Timeout Errors

Error: Request timeoutSolution: Increase timeout:

graph_config = {
    "llm": {
        "api_key": os.getenv("GROQ_API_KEY"),
        "model": "groq/llama-3.3-70b-versatile",
        "request_timeout": 60,  # 60 seconds
    },
}

Context Length Exceeded

Error: Context length exceededSolution: Use model with larger context:

# Use Llama 3.3 or Llama 3.1 (128K tokens)
"model": "groq/llama-3.3-70b-versatile"  # 128K context

Advantages of Groq

Ultra Fast

10x faster inference than traditional providers - perfect for high-volume scraping.

Cost-Effective

Competitive pricing with generous free tier for testing.

Open Models

Access to latest Llama, Mixtral, and Gemma models.

Low Latency

~100ms response times for real-time applications.

Use Cases

High-Volume Scraping
Real-Time Applications
API Endpoints

Groq excels at scraping many pages quickly:

# Scrape 100+ pages efficiently
for url in large_url_list:
    result = scraper.run()
    # Process in milliseconds!

Low latency for live data:

# Update dashboards in real-time
while True:
    data = scraper.run()
    update_dashboard(data)
    time.sleep(60)

Fast enough for API responses:

from fastapi import FastAPI

app = FastAPI()

@app.get("/scrape")
async def scrape_endpoint(url: str):
    result = scraper.run()  # Fast response
    return result

Best Practices

Use Latest Models

Llama 3.3 and 3.1 offer best performance:

"model": "groq/llama-3.3-70b-versatile"

Temperature 0

For consistent scraping:

"temperature": 0

Implement Retries

Handle transient errors:

@retry(stop=stop_after_attempt(3))
def scrape(): ...

Monitor Usage

Track API usage and costs in Groq console.

Speed Comparison

Approximate scraping times per page:

Provider              Time
────────────────────────────
Groq (llama-3.1-8b)   ~2-3s
Groq (llama-3.3-70b)  ~3-5s
OpenAI (gpt-4o-mini)  ~8-12s
OpenAI (gpt-4o)       ~10-15s
Anthropic (claude)    ~8-12s
Ollama (local)        ~5-20s

Times vary based on page complexity and prompt. Groq is consistently fastest for cloud providers.

Get Started

Core Concepts

Graphs

Configuration

Examples

Advanced

Overview

Prerequisites

Basic Configuration

Available Models

Llama 3.3 70B (Best Quality)

Llama 3.1 8B Instant (Fastest)

Configuration Options

Temperature

Max Tokens

Top P (Nucleus Sampling)

Complete Examples

Performance Optimization

Rate Limits

Troubleshooting

Advantages of Groq

Ultra Fast

Cost-Effective

Open Models

Low Latency

Use Cases

Best Practices

Use Latest Models

Temperature 0

Implement Retries

Monitor Usage

Speed Comparison

Next Steps

Advanced Configuration

OpenAI

Build docs developers (and LLMs) love

Get Started

Core Concepts

Graphs

Configuration

Examples

Advanced

​Overview

​Prerequisites

​Basic Configuration

​Available Models

​Llama 3.3 70B (Best Quality)

​Llama 3.1 8B Instant (Fastest)

​Configuration Options

​Temperature

​Max Tokens

​Top P (Nucleus Sampling)

​Complete Examples

​Performance Optimization

​Rate Limits

​Troubleshooting

​Advantages of Groq

Ultra Fast

Cost-Effective

Open Models

Low Latency

​Use Cases

​Best Practices

Use Latest Models

Temperature 0

Implement Retries

Monitor Usage

​Speed Comparison

​Next Steps

Advanced Configuration

OpenAI

Build docs developers (and LLMs) love

Overview

Prerequisites

Basic Configuration

Available Models

Llama 3.3 70B (Best Quality)

Llama 3.1 8B Instant (Fastest)

Configuration Options

Temperature

Max Tokens

Top P (Nucleus Sampling)

Complete Examples

Performance Optimization

Rate Limits

Troubleshooting

Advantages of Groq

Use Cases

Best Practices

Speed Comparison

Next Steps