Skip to main content

Installation

Get ScrapeGraphAI up and running in your Python environment. This guide covers installation for pip, virtual environment setup, and post-installation configuration.

Requirements

Before installing ScrapeGraphAI, ensure you have:
  • Python 3.10 or higher (up to Python 3.12)
  • pip package manager
  • Internet connection for downloading dependencies
It is strongly recommended to install ScrapeGraphAI in a virtual environment to avoid conflicts with other libraries.

Installation Steps

1

Create a Virtual Environment (Recommended)

Create an isolated Python environment for your project:
# Using venv (built-in)
python -m venv scrapegraph-env

# Activate the environment
# On macOS/Linux:
source scrapegraph-env/bin/activate

# On Windows:
scrapegraph-env\Scripts\activate
Alternatively, use conda if you prefer:
conda create -n scrapegraph-env python=3.10
conda activate scrapegraph-env
2

Install ScrapeGraphAI

Install the library using pip:
pip install scrapegraphai
This will install ScrapeGraphAI along with its core dependencies including:
  • langchain and related packages
  • beautifulsoup4 for HTML parsing
  • playwright for browser automation
  • pydantic for data validation
  • Other required dependencies
3

Install Playwright Browsers

This step is critical for fetching website content. Install Playwright browser binaries:
playwright install
This downloads Chromium, Firefox, and WebKit browsers needed for scraping dynamic websites.
Skipping this step will cause errors when trying to scrape websites. Always run playwright install after installing ScrapeGraphAI.
4

Verify Installation

Verify that ScrapeGraphAI is installed correctly:
import scrapegraphai
print(scrapegraphai.__version__)
You should see the version number printed (e.g., 1.73.1).

Optional Dependencies

ScrapeGraphAI offers optional features that require additional packages:

Burr Integration

For advanced workflow visualization and debugging:
pip install scrapegraphai[burr]

NVIDIA AI Integration

For using NVIDIA AI endpoints:
pip install scrapegraphai[nvidia]

OCR Support

For extracting text from images and PDFs:
pip install scrapegraphai[ocr]

LLM Provider Setup

ScrapeGraphAI works with various LLM providers. You’ll need to set up at least one:

OpenAI

  1. Get an API key from OpenAI Platform
  2. Set it as an environment variable:
export OPENAI_API_KEY="your-api-key-here"
Or use a .env file:
OPENAI_API_KEY=your-api-key-here

Ollama (Local Models)

  1. Install Ollama from ollama.com
  2. Download a model:
ollama pull llama3.2
  1. Ensure Ollama is running:
ollama serve
Ollama runs locally and doesn’t require an API key, making it great for development and privacy-sensitive applications.

Other Providers

export ANTHROPIC_API_KEY="your-api-key-here"

Environment Configuration

Using python-dotenv

Install python-dotenv to manage environment variables:
pip install python-dotenv
Create a .env file in your project root:
# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Add other API keys as needed
Load environment variables in your script:
from dotenv import load_dotenv
import os

load_dotenv()

api_key = os.getenv("OPENAI_API_KEY")

Troubleshooting

Import Errors

If you encounter import errors:
# Upgrade pip
pip install --upgrade pip

# Reinstall ScrapeGraphAI
pip install --upgrade --force-reinstall scrapegraphai

Playwright Issues

If Playwright browsers are not found:
# Install system dependencies (Linux)
sudo apt-get install libnss3 libatk1.0-0 libatk-bridge2.0-0 libcups2 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libpango-1.0-0 libcairo2 libasound2

# Reinstall Playwright browsers
playwright install --with-deps

Version Conflicts

If you have dependency conflicts:
# Create a fresh virtual environment
python -m venv fresh-env
source fresh-env/bin/activate  # or fresh-env\Scripts\activate on Windows
pip install scrapegraphai
playwright install

Telemetry

ScrapeGraphAI collects anonymous usage metrics to improve the library. To opt out:
export SCRAPEGRAPHAI_TELEMETRY_ENABLED=false
Or in your .env file:
SCRAPEGRAPHAI_TELEMETRY_ENABLED=false

Verification Script

Run this script to verify your installation is complete:
import sys

try:
    import scrapegraphai
    print(f"✓ ScrapeGraphAI {scrapegraphai.__version__} installed")
except ImportError:
    print("✗ ScrapeGraphAI not installed")
    sys.exit(1)

try:
    from playwright.sync_api import sync_playwright
    with sync_playwright() as p:
        browser = p.chromium.launch()
        browser.close()
    print("✓ Playwright browsers installed")
except Exception as e:
    print(f"✗ Playwright issue: {e}")
    sys.exit(1)

print("\n✓ All checks passed! You're ready to use ScrapeGraphAI.")

Next Steps

Quick Start

Now that you have ScrapeGraphAI installed, build your first scraper!

Build docs developers (and LLMs) love