Welcome to ScrapeGraphAI
ScrapeGraphAI is a revolutionary web scraping Python library that uses Large Language Models (LLMs) and direct graph logic to create scraping pipelines for websites and local documents. Instead of writing complex XPath selectors or CSS queries, you simply describe what information you want to extract in natural language.What is ScrapeGraphAI?
ScrapeGraphAI transforms the traditional web scraping paradigm by leveraging AI to understand and extract data from web pages. Just tell the library which information you want to extract, and it will do it for you automatically.You Only Scrape Once - ScrapeGraphAI’s motto reflects its intelligent approach to web scraping, where AI agents understand the structure and extract exactly what you need.
Key Features
Natural Language Prompts
Extract data using simple text descriptions instead of complex selectors. No need to inspect HTML or write XPath queries.
Multiple LLM Support
Works with OpenAI, Anthropic, Groq, Azure, Gemini, and local models via Ollama. Choose the model that fits your needs and budget.
Multiple Data Sources
Scrape from websites, HTML, XML, JSON, CSV, Markdown, and other document formats with the same simple interface.
Built-in Graph Pipelines
Pre-built scraping pipelines for single pages, multi-page scraping, search results, and more complex scenarios.
Why Choose ScrapeGraphAI?
AI-Powered Intelligence
Traditional web scrapers break when websites change their structure. ScrapeGraphAI uses LLMs to understand content semantically, making your scrapers more resilient to layout changes.Developer-Friendly
No need to:- Inspect element structures
- Write complex CSS selectors or XPath queries
- Handle pagination logic manually
- Parse and structure data manually
Flexible Architecture
Built on LangChain, ScrapeGraphAI provides:- Modular graph nodes for customizable pipelines
- Schema validation using Pydantic models
- Multi-source scraping from a single prompt
- Parallel execution for improved performance
Available Pipelines
ScrapeGraphAI comes with multiple pre-built graph pipelines:| Pipeline | Description |
|---|---|
| SmartScraperGraph | Single-page scraper with a user prompt and input source |
| SearchGraph | Multi-page scraper that extracts information from top search results |
| SpeechGraph | Extracts information and generates an audio file |
| ScriptCreatorGraph | Generates a Python script for scraping |
| SmartScraperMultiGraph | Multi-page scraper with a single prompt and multiple sources |
| ScriptCreatorMultiGraph | Generates Python scripts for multiple pages |
Each graph has a multi version that makes parallel LLM calls for improved performance.
Supported LLM Providers
ScrapeGraphAI integrates with major LLM providers:- OpenAI (GPT-4, GPT-3.5, GPT-4o)
- Anthropic Claude
- Google Gemini
- Groq
- Azure OpenAI
- Ollama (local models like Llama, Mistral, Phi)
Use Cases
Data Collection
Gather product information, pricing data, or market research from multiple websites automatically.
Content Aggregation
Extract articles, blog posts, or news from various sources into a structured format.
Lead Generation
Collect contact information, company details, and social media links from business websites.
AI Agent Integration
Provide clean, structured data to AI agents through integrations with LangChain, LlamaIndex, and Crew.ai.
Integration Ecosystem
ScrapeGraphAI seamlessly integrates with popular frameworks:- LLM Frameworks: LangChain, LlamaIndex, Crew.ai, Agno, CamelAI
- Low-code Platforms: Pipedream, Bubble, Zapier, n8n, Dify
- MCP Server: Available on Smithery
- SDKs: Python and Node.js SDKs for the hosted API
Performance
According to the Firecrawl benchmark, ScrapeGraphAI is the best fetcher on the market for accurate data extraction.
Next Steps
Installation
Get started by installing ScrapeGraphAI and its dependencies
Quick Start
Build your first scraper in under 5 minutes
Community and Support
Join the ScrapeGraphAI community:- GitHub: ScrapeGraphAI/Scrapegraph-ai
- Discord: Join our Discord server
- Documentation: Official docs
- PyPI: scrapegraphai package
