Ollama Configuration

Overview

Ollama allows you to run large language models locally on your machine. This means:

Free: No API costs
Private: Your data never leaves your computer
Fast: No network latency
Offline: Works without internet connection

Perfect for development, testing, or privacy-sensitive projects.

Prerequisites

Install Ollama

Download and install Ollama from ollama.ai:

# Download from ollama.ai or use Homebrew
brew install ollama

Start Ollama Server

ollama serve

The server will start on http://localhost:11434

Pull a Model

Download a model (e.g., Llama 3.2):

ollama pull llama3.2

First-time download may take a few minutes depending on model size.

Install ScrapeGraphAI

pip install scrapegraphai
playwright install

Basic Configuration

from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "temperature": 0,
        "model_tokens": 4096,
    },
    "verbose": True,
    "headless": False,
}

smart_scraper_graph = SmartScraperGraph(
    prompt="Find some information about the founders.",
    source="https://scrapegraphai.com/",
    config=graph_config,
)

result = smart_scraper_graph.run()
print(result)

This example is from: examples/smart_scraper_graph/ollama/smart_scraper_ollama.py

Recommended Models

Recommended
Fast & Lightweight
Specialized

Llama 3.2 (Best for Most Tasks)

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "temperature": 0,
        "model_tokens": 128000,
    },
}

# Pull the model
ollama pull llama3.2

Size: 7B parameters
Context: 128K tokens
RAM: ~8GB
Best for: General scraping tasks

Llama 3.3 70B (Highest Quality)

graph_config = {
    "llm": {
        "model": "ollama/llama3.3",
        "temperature": 0,
        "model_tokens": 128000,
    },
}

ollama pull llama3.3:70b

Size: 70B parameters
Context: 128K tokens
RAM: ~40GB
Best for: Complex scraping, highest accuracy

Llama 3.2 1B (Ultra Fast)

graph_config = {
    "llm": {
        "model": "ollama/llama3.2:1b",
        "temperature": 0,
        "model_tokens": 128000,
    },
}

ollama pull llama3.2:1b

RAM: ~2GB
Best for: Simple scraping, low-resource systems

Gemma 2 (Google)

graph_config = {
    "llm": {
        "model": "ollama/gemma2",
        "temperature": 0,
        "model_tokens": 128000,
    },
}

ollama pull gemma2

RAM: ~6GB
Best for: Balanced performance

Mistral (European)

graph_config = {
    "llm": {
        "model": "ollama/mistral",
        "temperature": 0,
        "format": "json",
        "model_tokens": 128000,
    },
}

ollama pull mistral

RAM: ~8GB
Best for: JSON output, European languages

Qwen 2.5 (Chinese)

graph_config = {
    "llm": {
        "model": "ollama/qwen:14b",
        "temperature": 0,
    },
}

ollama pull qwen:14b

Best for: Chinese content, multilingual

Configuration Options

Custom Base URL

If Ollama is running on a different host or port:

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "base_url": "http://192.168.1.100:11434",  # Remote Ollama server
        "temperature": 0,
    },
}

JSON Format Mode

Force JSON output (required for some models):

graph_config = {
    "llm": {
        "model": "ollama/mistral",
        "temperature": 0,
        "format": "json",  # Ollama needs format specified
    },
}

Some Ollama models require "format": "json" for structured output. If you get parsing errors, add this option.

Embeddings Configuration

Use local embeddings for better RAG performance:

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "temperature": 0,
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "temperature": 0,
    },
}

# Pull the embedding model
ollama pull nomic-embed-text

Complete Examples

from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "temperature": 0,
        "model_tokens": 4096,
    },
    "verbose": True,
    "headless": False,
}

smart_scraper_graph = SmartScraperGraph(
    prompt="Find some information about the founders.",
    source="https://scrapegraphai.com/",
    config=graph_config,
)

result = smart_scraper_graph.run()
print(result)

# Get execution info
graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))

Available Models

View all available models:

ollama list

Popular models for scraping:

Model	Size	Context	RAM Needed
`llama3.2:1b`	1B	128K	~2GB
`llama3.2`	7B	128K	~8GB
`llama3.3:70b`	70B	128K	~40GB
`mistral`	7B	128K	~8GB
`gemma2`	9B	128K	~6GB
`qwen:14b`	14B	32K	~10GB
`codellama`	7B	16K	~8GB
`nomic-embed-text`	-	8K	~1GB

Pull any model:

ollama pull <model-name>

Performance Tips

Use GPU Acceleration

Ollama automatically uses GPU if available. Verify with:

ollama ps

For NVIDIA GPUs, ensure CUDA is installed. For Apple Silicon, Metal is used automatically.

Increase Context Length

For long documents, increase model_tokens:

"llm": {
    "model": "ollama/llama3.2",
    "model_tokens": 128000,  # Maximum context
}

Keep Model in Memory

Ollama keeps models in memory for 5 minutes by default. Increase this:

# Set to 1 hour
export OLLAMA_KEEP_ALIVE=1h
ollama serve

Use Lighter Models for Simple Tasks

For basic scraping, use smaller models:

"model": "ollama/llama3.2:1b"  # Fast and efficient

Troubleshooting

Connection Refused

Error: Connection refused to http://localhost:11434Solution: Ensure Ollama is running:

ollama serve

Model Not Found

Error: model 'llama3.2' not foundSolution: Pull the model first:

ollama pull llama3.2

Out of Memory

Error: System runs out of RAMSolution: Use a smaller model:

"model": "ollama/llama3.2:1b"  # Only 2GB RAM

JSON Parsing Error

Error: Failed to parse JSON responseSolution: Add format parameter:

"llm": {
    "model": "ollama/mistral",
    "format": "json",  # Force JSON output
}

Advantages of Ollama

Free

No API costs - run unlimited scraping jobs

Private

Your data never leaves your machine

Fast

No network latency, especially with GPU

Offline

Works without internet connection

Next Steps

OpenAI

Compare with cloud-based OpenAI models

Advanced Config

Learn about proxy rotation and browser settings

Get Started

Core Concepts

Graphs

Configuration

Examples

Advanced

Overview

Prerequisites

Basic Configuration

Recommended Models

Llama 3.2 (Best for Most Tasks)

Llama 3.3 70B (Highest Quality)

Llama 3.2 1B (Ultra Fast)

Gemma 2 (Google)

Mistral (European)

Qwen 2.5 (Chinese)

Configuration Options

Custom Base URL

JSON Format Mode

Embeddings Configuration

Complete Examples

Available Models

Performance Tips

Troubleshooting

Advantages of Ollama

Free

Private

Fast

Offline

Next Steps

OpenAI

Advanced Config

Build docs developers (and LLMs) love

Get Started

Core Concepts

Graphs

Configuration

Examples

Advanced

​Overview

​Prerequisites

​Basic Configuration

​Recommended Models

​Llama 3.2 (Best for Most Tasks)

​Llama 3.3 70B (Highest Quality)

​Llama 3.2 1B (Ultra Fast)

​Gemma 2 (Google)

​Mistral (European)

​Qwen 2.5 (Chinese)

​Configuration Options

​Custom Base URL

​JSON Format Mode

​Embeddings Configuration

​Complete Examples

​Available Models

​Performance Tips

​Troubleshooting

​Advantages of Ollama

Free

Private

Fast

Offline

​Next Steps

OpenAI

Advanced Config

Build docs developers (and LLMs) love

Overview

Prerequisites

Basic Configuration

Recommended Models

Llama 3.2 (Best for Most Tasks)

Llama 3.3 70B (Highest Quality)

Llama 3.2 1B (Ultra Fast)

Gemma 2 (Google)

Mistral (European)

Qwen 2.5 (Chinese)

Configuration Options

Custom Base URL

JSON Format Mode

Embeddings Configuration

Complete Examples

Available Models

Performance Tips

Troubleshooting

Advantages of Ollama

Next Steps