Overview
Ollama allows you to run large language models locally on your machine. This means:
- Free: No API costs
- Private: Your data never leaves your computer
- Fast: No network latency
- Offline: Works without internet connection
Perfect for development, testing, or privacy-sensitive projects.
Prerequisites
Install Ollama
Download and install Ollama from ollama.ai:# Download from ollama.ai or use Homebrew
brew install ollama
Start Ollama Server
The server will start on http://localhost:11434 Pull a Model
Download a model (e.g., Llama 3.2):First-time download may take a few minutes depending on model size.
Install ScrapeGraphAI
pip install scrapegraphai
playwright install
Basic Configuration
from scrapegraphai.graphs import SmartScraperGraph
graph_config = {
"llm": {
"model": "ollama/llama3.2",
"temperature": 0,
"model_tokens": 4096,
},
"verbose": True,
"headless": False,
}
smart_scraper_graph = SmartScraperGraph(
prompt="Find some information about the founders.",
source="https://scrapegraphai.com/",
config=graph_config,
)
result = smart_scraper_graph.run()
print(result)
This example is from: examples/smart_scraper_graph/ollama/smart_scraper_ollama.py
Recommended Models
Recommended
Fast & Lightweight
Specialized
Llama 3.2 (Best for Most Tasks)
graph_config = {
"llm": {
"model": "ollama/llama3.2",
"temperature": 0,
"model_tokens": 128000,
},
}
# Pull the model
ollama pull llama3.2
- Size: 7B parameters
- Context: 128K tokens
- RAM: ~8GB
- Best for: General scraping tasks
Llama 3.3 70B (Highest Quality)
graph_config = {
"llm": {
"model": "ollama/llama3.3",
"temperature": 0,
"model_tokens": 128000,
},
}
- Size: 70B parameters
- Context: 128K tokens
- RAM: ~40GB
- Best for: Complex scraping, highest accuracy
Llama 3.2 1B (Ultra Fast)
graph_config = {
"llm": {
"model": "ollama/llama3.2:1b",
"temperature": 0,
"model_tokens": 128000,
},
}
- RAM: ~2GB
- Best for: Simple scraping, low-resource systems
Gemma 2 (Google)
graph_config = {
"llm": {
"model": "ollama/gemma2",
"temperature": 0,
"model_tokens": 128000,
},
}
- RAM: ~6GB
- Best for: Balanced performance
Mistral (European)
graph_config = {
"llm": {
"model": "ollama/mistral",
"temperature": 0,
"format": "json",
"model_tokens": 128000,
},
}
- RAM: ~8GB
- Best for: JSON output, European languages
Qwen 2.5 (Chinese)
graph_config = {
"llm": {
"model": "ollama/qwen:14b",
"temperature": 0,
},
}
- Best for: Chinese content, multilingual
Configuration Options
Custom Base URL
If Ollama is running on a different host or port:
graph_config = {
"llm": {
"model": "ollama/llama3.2",
"base_url": "http://192.168.1.100:11434", # Remote Ollama server
"temperature": 0,
},
}
Force JSON output (required for some models):
graph_config = {
"llm": {
"model": "ollama/mistral",
"temperature": 0,
"format": "json", # Ollama needs format specified
},
}
Some Ollama models require "format": "json" for structured output. If you get parsing errors, add this option.
Embeddings Configuration
Use local embeddings for better RAG performance:
graph_config = {
"llm": {
"model": "ollama/llama3.2",
"temperature": 0,
},
"embeddings": {
"model": "ollama/nomic-embed-text",
"temperature": 0,
},
}
# Pull the embedding model
ollama pull nomic-embed-text
Complete Examples
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info
graph_config = {
"llm": {
"model": "ollama/llama3.2",
"temperature": 0,
"model_tokens": 4096,
},
"verbose": True,
"headless": False,
}
smart_scraper_graph = SmartScraperGraph(
prompt="Find some information about the founders.",
source="https://scrapegraphai.com/",
config=graph_config,
)
result = smart_scraper_graph.run()
print(result)
# Get execution info
graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))
Available Models
View all available models:
Popular models for scraping:
| Model | Size | Context | RAM Needed |
|---|
llama3.2:1b | 1B | 128K | ~2GB |
llama3.2 | 7B | 128K | ~8GB |
llama3.3:70b | 70B | 128K | ~40GB |
mistral | 7B | 128K | ~8GB |
gemma2 | 9B | 128K | ~6GB |
qwen:14b | 14B | 32K | ~10GB |
codellama | 7B | 16K | ~8GB |
nomic-embed-text | - | 8K | ~1GB |
Pull any model:
Ollama automatically uses GPU if available. Verify with:For NVIDIA GPUs, ensure CUDA is installed. For Apple Silicon, Metal is used automatically.
For long documents, increase model_tokens:"llm": {
"model": "ollama/llama3.2",
"model_tokens": 128000, # Maximum context
}
Ollama keeps models in memory for 5 minutes by default. Increase this:# Set to 1 hour
export OLLAMA_KEEP_ALIVE=1h
ollama serve
Use Lighter Models for Simple Tasks
For basic scraping, use smaller models:"model": "ollama/llama3.2:1b" # Fast and efficient
Troubleshooting
Error: Connection refused to http://localhost:11434Solution: Ensure Ollama is running:
Error: model 'llama3.2' not foundSolution: Pull the model first:
Error: System runs out of RAMSolution: Use a smaller model:"model": "ollama/llama3.2:1b" # Only 2GB RAM
Error: Failed to parse JSON responseSolution: Add format parameter:"llm": {
"model": "ollama/mistral",
"format": "json", # Force JSON output
}
Advantages of Ollama
Free
No API costs - run unlimited scraping jobs
Private
Your data never leaves your machine
Fast
No network latency, especially with GPU
Offline
Works without internet connection
Next Steps
OpenAI
Compare with cloud-based OpenAI models
Advanced Config
Learn about proxy rotation and browser settings