Overview
ThewebScraper tool fetches and parses webpage content using the Firecrawl API. It extracts clean, readable text in Markdown format, optimized for AI analysis.
Firecrawl handles JavaScript rendering, anti-bot measures, and content extraction automatically.
Function Signatures
As Genkit Tool
As Standalone Function
src/ai/tools/web-scraper.ts:89
Input Schema
The URL of the webpage to scrape. Must be a valid URL format.
Input Type
Output Schema
Single Scrape (Tool)
The extracted textual content of the webpage in Markdown format (up to 20,000 characters).
Batch Scrape (Function)
Array of scrape results.Each result contains:
url: The scraped URLcontent: Extracted Markdown content (up to 20,000 chars)source: Always “Firecrawl”
Batch Scraping
ThebatchScrapeParallel function scrapes multiple URLs simultaneously for maximum performance:
Parallel Processing
- Scrapes all URLs simultaneously using
Promise.all - Filters out failed scrapes automatically
- Returns only successful results
- Logs progress and success rate
How It Works
- API Request: Sends scrape request to Firecrawl with Markdown format
- Content Extraction: Firecrawl renders JavaScript and extracts clean text
- Validation: Checks content length (must be > 100 characters)
- Truncation: Limits content to 20,000 characters for context window
- Return: Returns Markdown-formatted text
Example Usage
Single Scrape (Tool)
Single Scrape (Direct)
Batch Scrape
With AI Flow
Configuration
Firecrawl API Settings:- Formats: Markdown
- Timeout: 30 seconds
- Max Content: 20,000 characters
- Min Content: 100 characters (validation)
- Endpoint:
https://api.firecrawl.dev/v1/scrape
Environment Variables
Your Firecrawl API key. Get one at firecrawl.dev.
Error Handling
Single Scrape
- Throws error if scrape fails
- Returns
nullif content too short - Logs warnings for debugging
Batch Scrape
- Silently skips failed URLs
- Returns only successful scrapes
- Logs success/failure counts
- Never throws (returns empty array if all fail)
Performance
- Single Scrape: 1-5 seconds depending on page
- Batch Scrape: Parallel processing, same as slowest URL
- Success Rate: ~90-95% on standard news sites
- Content Quality: High - JavaScript rendered, clean extraction
Use Cases
- Argument Analysis: Extract article content for blueprint generation
- Research: Gather information from multiple sources
- Fact Checking: Retrieve full context of claims
- Content Summarization: Get clean text for AI summarization
- Source Verification: Read original sources cited in arguments
