Skip to main content
You Only Scrape Once — Let AI do the heavy lifting while you focus on extracting the data you need.

What is ScrapeGraphAI?

ScrapeGraphAI is a Python web scraping library that uses Large Language Models (LLMs) and graph-based logic to create intelligent scraping pipelines. Instead of writing complex scraping logic, you simply describe what information you want to extract, and ScrapeGraphAI handles the rest.

Quick Start

Get started with ScrapeGraphAI in under 5 minutes

Installation

Install ScrapeGraphAI and its dependencies

Graph Types

Explore different graph types for various scraping scenarios

API Reference

Complete API documentation for all graphs and nodes

Key Features

AI-Powered Extraction

Use natural language to describe what data you want to extract from any website

20+ LLM Providers

Support for OpenAI, Ollama, Azure, Gemini, Groq, and many more LLM providers

Multiple Graph Types

SmartScraper, SearchGraph, SpeechGraph, and more for different use cases

Document Scraping

Extract data from CSV, JSON, XML, HTML, and Markdown files

Multi-Page Scraping

Scrape multiple pages in parallel with optimized performance

Schema Validation

Define structured output schemas using Pydantic models
Extract product information, prices, reviews, and specifications from e-commerce websites without writing complex selectors.
Gather information from multiple sources, search results, and research papers for analysis and machine learning projects.
Monitor competitor websites, pricing, and content changes automatically using AI-powered scraping.
Collect and aggregate content from multiple websites and transform it into structured formats like JSON or CSV.

Why ScrapeGraphAI?

Traditional web scraping requires you to:
  • Write complex CSS selectors or XPath expressions
  • Handle different website structures manually
  • Maintain brittle code that breaks when websites change
  • Parse and structure data yourself
ScrapeGraphAI simplifies this:
from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "temperature": 0,
    },
}

smart_scraper = SmartScraperGraph(
    prompt="Extract the product name, price, and description",
    source="https://example.com/product",
    config=graph_config
)

result = smart_scraper.run()
print(result)

Examples

View real-world examples and use cases

Configuration

Configure LLM providers and advanced settings

Custom Graphs

Build custom scraping pipelines

Build docs developers (and LLMs) love