Overview
Groq provides lightning-fast LLM inference using specialized hardware, delivering:
- Ultra-Fast: 10x faster inference than traditional providers
- Cost-Effective: Competitive pricing
- Open Models: Access to Llama, Mixtral, and Gemma
- High Throughput: Process more requests per second
- Low Latency: ~100ms response times
Perfect for high-volume scraping and real-time applications.
Prerequisites
Get API Key
- Sign up at console.groq.com
- Navigate to API Keys
- Click “Create API Key”
- Copy your API key
Install ScrapeGraphAI
pip install scrapegraphai
playwright install
Set Environment Variable
export GROQ_API_KEY="gsk_..."
Or create a .env file:
Basic Configuration
import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
load_dotenv()
graph_config = {
"llm": {
"api_key": os.getenv("GROQ_API_KEY"),
"model": "groq/llama-3.3-70b-versatile",
"temperature": 0,
},
"verbose": True,
"headless": False,
}
smart_scraper_graph = SmartScraperGraph(
prompt="List me all the projects with their description",
source="https://perinim.github.io/projects/",
config=graph_config,
)
result = smart_scraper_graph.run()
print(result)
This example is based on: examples/extras/undected_playwright.py and examples/extras/cond_smartscraper_usage.py
Available Models
Recommended
All Models
Comparison
Llama 3.3 70B (Best Quality)
graph_config = {
"llm": {
"api_key": os.getenv("GROQ_API_KEY"),
"model": "groq/llama-3.3-70b-versatile",
"temperature": 0,
},
}
- Context: 128K tokens
- Speed: Very fast
- Best for: High accuracy scraping
Llama 3.1 8B Instant (Fastest)
graph_config = {
"llm": {
"api_key": os.getenv("GROQ_API_KEY"),
"model": "groq/llama-3.1-8b-instant",
"temperature": 0,
},
}
- Context: 128K tokens
- Speed: Ultra fast
- Best for: Speed-critical applications
Complete list of Groq models:| Model | Context | Speed | Best For |
|---|
llama-3.3-70b-versatile | 128K | Very Fast | Best quality |
llama-3.1-8b-instant | 128K | Ultra Fast | Speed priority |
llama3-70b-8192 | 8K | Fast | Legacy |
llama3-8b-8192 | 8K | Very Fast | Legacy |
mixtral-8x7b-32768 | 32K | Fast | Good balance |
gemma2-9b-it | 8K | Very Fast | Google model |
gemma-7b-it | 8K | Very Fast | Lightweight |
Use llama-3.3-70b-versatile or llama-3.1-8b-instant for best results.
Speed comparison (approximate):Model Tokens/sec
─────────────────────────────────────
llama-3.1-8b-instant ~800
llama-3.3-70b-versatile ~300
gemma2-9b-it ~700
mixtral-8x7b-32768 ~500
All Groq models are significantly faster than traditional cloud providers.
Configuration Options
Temperature
Control output randomness:
graph_config = {
"llm": {
"api_key": os.getenv("GROQ_API_KEY"),
"model": "groq/llama-3.3-70b-versatile",
"temperature": 0, # Deterministic (recommended for scraping)
},
}
0: Deterministic, consistent
0.5: Balanced
1.0: Creative, varied
Always use temperature: 0 for web scraping to ensure consistent results.
Max Tokens
Limit response length:
graph_config = {
"llm": {
"api_key": os.getenv("GROQ_API_KEY"),
"model": "groq/llama-3.3-70b-versatile",
"max_tokens": 4000,
},
}
Top P (Nucleus Sampling)
Control diversity:
graph_config = {
"llm": {
"api_key": os.getenv("GROQ_API_KEY"),
"model": "groq/llama-3.3-70b-versatile",
"temperature": 0,
"top_p": 1.0, # Default
},
}
Complete Examples
import os
import json
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info
load_dotenv()
graph_config = {
"llm": {
"api_key": os.getenv("GROQ_API_KEY"),
"model": "groq/llama-3.3-70b-versatile",
"temperature": 0,
},
"verbose": True,
"headless": True,
}
smart_scraper = SmartScraperGraph(
prompt="Extract all article titles and summaries",
source="https://www.wired.com",
config=graph_config,
)
result = smart_scraper.run()
print(json.dumps(result, indent=4))
graph_exec_info = smart_scraper.get_execution_info()
print(prettify_exec_info(graph_exec_info))
For maximum throughput:"model": "groq/llama-3.1-8b-instant" # Ultra fast
Browser runs faster in background:"headless": True # 20-30% faster
Process multiple URLs concurrently:from concurrent.futures import ThreadPoolExecutor
def scrape_url(url):
scraper = SmartScraperGraph(
prompt="Extract data",
source=url,
config=graph_config,
)
return scraper.run()
with ThreadPoolExecutor(max_workers=5) as executor:
results = list(executor.map(scrape_url, urls))
Limit output for faster responses:"max_tokens": 2000 # Smaller = faster
Rate Limits
Groq has generous rate limits:
- Free Tier: 30 requests/minute
- Paid Tier: Higher limits available
Groq’s high inference speed means you can process more data even with rate limits.
Implement rate limiting:
import time
from ratelimit import limits, sleep_and_retry
@sleep_and_retry
@limits(calls=30, period=60) # 30 requests per minute
def scrape_with_rate_limit(url):
scraper = SmartScraperGraph(
prompt="Extract data",
source=url,
config=graph_config,
)
return scraper.run()
Troubleshooting
Error: AuthenticationError: Invalid API keySolution:
- Verify API key at console.groq.com
- Ensure it starts with
gsk_
- Check environment variable:
Error: 429 Rate limit exceededSolution: Implement rate limiting:import time
for url in urls:
result = scraper.run()
time.sleep(2) # Wait 2 seconds between requests
Error: Request timeoutSolution: Increase timeout:graph_config = {
"llm": {
"api_key": os.getenv("GROQ_API_KEY"),
"model": "groq/llama-3.3-70b-versatile",
"request_timeout": 60, # 60 seconds
},
}
Error: Context length exceededSolution: Use model with larger context:# Use Llama 3.3 or Llama 3.1 (128K tokens)
"model": "groq/llama-3.3-70b-versatile" # 128K context
Advantages of Groq
Ultra Fast
10x faster inference than traditional providers - perfect for high-volume scraping.
Cost-Effective
Competitive pricing with generous free tier for testing.
Open Models
Access to latest Llama, Mixtral, and Gemma models.
Low Latency
~100ms response times for real-time applications.
Use Cases
High-Volume Scraping
Real-Time Applications
API Endpoints
Groq excels at scraping many pages quickly:# Scrape 100+ pages efficiently
for url in large_url_list:
result = scraper.run()
# Process in milliseconds!
Low latency for live data:# Update dashboards in real-time
while True:
data = scraper.run()
update_dashboard(data)
time.sleep(60)
Fast enough for API responses:from fastapi import FastAPI
app = FastAPI()
@app.get("/scrape")
async def scrape_endpoint(url: str):
result = scraper.run() # Fast response
return result
Best Practices
Use Latest Models
Llama 3.3 and 3.1 offer best performance:"model": "groq/llama-3.3-70b-versatile"
Implement Retries
Handle transient errors:@retry(stop=stop_after_attempt(3))
def scrape(): ...
Monitor Usage
Track API usage and costs in Groq console.
Speed Comparison
Approximate scraping times per page:
Provider Time
────────────────────────────
Groq (llama-3.1-8b) ~2-3s
Groq (llama-3.3-70b) ~3-5s
OpenAI (gpt-4o-mini) ~8-12s
OpenAI (gpt-4o) ~10-15s
Anthropic (claude) ~8-12s
Ollama (local) ~5-20s
Times vary based on page complexity and prompt. Groq is consistently fastest for cloud providers.
Next Steps
Advanced Configuration
Learn about proxy rotation and browser settings
OpenAI
Compare with OpenAI models