Skip to main content

Firecrawl Projects

A comprehensive collection of Python automation tools built on top of Firecrawl CLI, designed for web scraping, competitive research, documentation extraction, and content monitoring.

Overview

Firecrawl Projects provides five powerful command-line tools that leverage the Firecrawl CLI to automate common web research and monitoring tasks. Each tool is designed to be run independently or chained together for complex workflows.

Deep Research

Search and synthesize information from multiple web sources

Competitor Analysis

Compare and analyze competitor websites automatically

Documentation Scraper

Extract entire documentation sites into local markdown

Lead Extractor

Find contact information from company websites

Content Monitor

Track website changes over time with diff detection

Prerequisites

1

Python 3.10+

Ensure you have Python 3.10 or higher installed:
python --version
2

Firecrawl CLI

Install Firecrawl CLI:
pip install firecrawl-cli
Or use the provided path:
C:\Users\ulilj\Firecrawl Projects\firecrawl-cli\.venv\Scripts\firecrawl.exe
3

Configure API Key

Set up your Firecrawl API key:
firecrawl config set-api-key YOUR_API_KEY
4

Verify Setup

firecrawl --status

Tools

1. Deep Research

Comprehensive topic research that searches the web and synthesizes findings.
Searches the web for your topic, scrapes the top results, and generates a detailed markdown research report.

2. Competitor Analysis

Analyze and compare competitor websites automatically.
  • Maps website structure of each competitor
  • Categorizes pages (pricing, features, about, blog, docs)
  • Scrapes key pages for content analysis
  • Generates comparison report

3. Documentation Scraper

Extract and organize documentation from any website into local markdown files.
  • Discovers all documentation pages
  • Filters to relevant URLs
  • Converts to clean markdown
  • Creates organized index

4. Lead Extractor

Extract business contact information from company websites.
  • Email addresses
  • Phone numbers
  • Social media links (LinkedIn, Twitter, Facebook)
  • Company name and description

5. Content Monitor

Track changes on websites over time with automatic diff detection.
  • Monitor pages for content changes
  • Save snapshots over time
  • Show diffs when changes detected
  • Maintain complete history

Installation

1

Clone or Download

cd "C:\Users\ulilj\Firecrawl Projects\tools"
2

Verify Firecrawl CLI

firecrawl --status
Or use full path:
& 'C:\Users\ulilj\Firecrawl Projects\firecrawl-cli\.venv\Scripts\firecrawl.exe' --status
3

Test a Tool

python deep_research.py "test topic" --limit 2

Common Patterns

Chaining Tools

Combine tools for powerful workflows:
# Research competitors, then monitor them
from deep_research import search_topic
from content_monitor import add_page

results = search_topic("AI startups 2025", limit=10)
for result in results:
    add_page(result["url"], result["title"])

Scheduled Monitoring

Windows Task Scheduler:
  1. Open Task Scheduler
  2. Create Basic Task
  3. Set trigger (e.g., daily at 9 AM)
  4. Action: Start a program
    • Program: python
    • Arguments: content_monitor.py check
    • Start in: C:\Users\ulilj\Firecrawl Projects\tools
Linux Cron:
# Run every day at 9 AM
0 9 * * * cd /path/to/tools && python content_monitor.py check

Batch Lead Extraction

# From a file of URLs
while IFS= read -r url; do
  python lead_extractor.py "$url" --format json --output "leads_$(date +%Y%m%d).json"
done < companies.txt

Output Formats

# Research Report: AI Trends 2025

Generated: 2026-03-05 10:30:00
Sources: 5

## Key Findings

1. **Trend Name**
   Source: [Example.com](https://example.com)
   
   Summary of the trend...

## Sources

- [Title](URL)

Firecrawl CLI Integration

All tools use the Firecrawl CLI with these key features:

--only-main-content

Extracts only the main content, removing navigation and ads

--format markdown

Returns clean markdown instead of HTML

--wait-for

Waits for JavaScript to render before scraping

Rate Limiting

Respects API rate limits automatically

Example CLI Calls

# Scrape single page
firecrawl scrape https://example.com --only-main-content --format markdown

# Map entire site
firecrawl map https://example.com --limit 100

# Crawl multiple pages
firecrawl crawl https://example.com --limit 50 --only-main-content

Tips & Best Practices

1

Start with Small Limits

Test with --limit 5 or --limit 10 to verify output before scaling up
2

Use Descriptive Filenames

Always specify --output with meaningful names for easier organization
3

Check _manifest Files

Tools that scrape multiple pages create _manifest.json - review for errors
4

Respect Rate Limits

Space out large scraping jobs. The CLI handles rate limiting but be considerate.
5

Organize Output

Create dated directories:
mkdir reports/$(date +%Y-%m-%d)
python deep_research.py "topic" --output reports/$(date +%Y-%m-%d)/research.md
Always respect robots.txt and terms of service. Some sites prohibit automated scraping.

Troubleshooting

Use full path:
$FIRECRAWL = "C:\Users\ulilj\Firecrawl Projects\firecrawl-cli\.venv\Scripts\firecrawl.exe"
& $FIRECRAWL --status
Or add to PATH in Windows:
  1. System Properties → Environment Variables
  2. Edit PATH
  3. Add: C:\Users\ulilj\Firecrawl Projects\firecrawl-cli\.venv\Scripts
Reconfigure:
firecrawl config set-api-key YOUR_API_KEY
Verify:
firecrawl config show
  • Check if the URL is accessible
  • Try scraping manually first: firecrawl scrape URL
  • Increase wait time for JS-heavy sites
  • Verify API quota hasn’t been exceeded
  • Ensure the page content actually changed
  • Some sites use dynamic timestamps that always show as changed
  • Check .content_monitor/ directory for stored snapshots

Use Cases

Market Research

Track competitor pricing, features, and messaging changes over time

Sales Intelligence

Extract contact information for lead generation and outreach

Documentation Archive

Backup documentation before major version updates

Content Strategy

Analyze competitor content and identify gaps in your own

SEO Research

Research what content ranks well for your target keywords

Due Diligence

Gather comprehensive information about companies for investment research

Get Started with Firecrawl

Sign up for Firecrawl API access to use these tools

Build docs developers (and LLMs) love