Firecrawl Projects

A comprehensive collection of Python automation tools built on top of Firecrawl CLI, designed for web scraping, competitive research, documentation extraction, and content monitoring.

Overview

Firecrawl Projects provides five powerful command-line tools that leverage the Firecrawl CLI to automate common web research and monitoring tasks. Each tool is designed to be run independently or chained together for complex workflows.

Deep Research

Search and synthesize information from multiple web sources

Competitor Analysis

Compare and analyze competitor websites automatically

Documentation Scraper

Extract entire documentation sites into local markdown

Lead Extractor

Find contact information from company websites

Content Monitor

Track website changes over time with diff detection

Prerequisites

Python 3.10+

Ensure you have Python 3.10 or higher installed:

python --version

Firecrawl CLI

Install Firecrawl CLI:

pip install firecrawl-cli

Or use the provided path:

C:\Users\ulilj\Firecrawl Projects\firecrawl-cli\.venv\Scripts\firecrawl.exe

Configure API Key

Set up your Firecrawl API key:

firecrawl config set-api-key YOUR_API_KEY

Verify Setup

firecrawl --status

Tools

1. Deep Research

Comprehensive topic research that searches the web and synthesizes findings.

Overview
Usage
Example

Searches the web for your topic, scrapes the top results, and generates a detailed markdown research report.

python deep_research.py "AI trends 2025" --limit 5 --output report.md

Options:

topic - The topic to research (required)
--limit, -l - Number of sources (default: 5)
--output, -o - Output file path (default: auto-generated)

python deep_research.py "best practices for React performance" --limit 10

Output:

research_react_performance_20260305.md
Includes source citations
Organized by subtopics
Key findings and recommendations

2. Competitor Analysis

Analyze and compare competitor websites automatically.

What It Does
Usage
Example

Maps website structure of each competitor
Categorizes pages (pricing, features, about, blog, docs)
Scrapes key pages for content analysis
Generates comparison report

python competitor_analysis.py https://competitor1.com https://competitor2.com --output analysis.md

Options:

urls - One or more competitor URLs (required)
--scrape-limit, -l - Max pages per category (default: 5)
--output, -o - Output file (default: auto-generated)

python competitor_analysis.py https://notion.so https://coda.io https://airtable.com

Report includes:

Feature comparison matrix
Pricing page analysis
Content strategy overview
Technical stack detection

3. Documentation Scraper

Extract and organize documentation from any website into local markdown files.

Features
Usage
Example

Discovers all documentation pages
Filters to relevant URLs
Converts to clean markdown
Creates organized index

python docs_scraper.py https://docs.example.com --output ./my_docs --limit 50

Options:

url - Documentation site URL (required)
--output, -o - Output directory (default: ./scraped_docs)
--limit, -l - Maximum pages (default: 100)
--filter, -f - Filter URLs containing this string

python docs_scraper.py https://docs.firecrawl.dev --output ./firecrawl_docs --limit 50

Output structure:

firecrawl_docs/
├── index.md
├── getting-started.md
├── api-reference.md
├── cli-usage.md
└── ...

4. Lead Extractor

Extract business contact information from company websites.

What It Extracts
Usage
Example

Email addresses
Phone numbers
Social media links (LinkedIn, Twitter, Facebook)
Company name and description

python lead_extractor.py https://company1.com https://company2.com --format json --output leads.json

Options:

urls - One or more website URLs (required)
--output, -o - Output file (default: auto-generated)
--format, -f - Output format: json, csv, markdown (default: json)

python lead_extractor.py stripe.com twilio.com sendgrid.com --format csv --output tech_companies.csv

CSV Output:

Company,Email,Phone,LinkedIn,Twitter,Website
Stripe,[email protected],+1-888-926-2289,linkedin.com/company/stripe,twitter.com/stripe,stripe.com
...

5. Content Monitor

Track changes on websites over time with automatic diff detection.

Features
Commands
Workflow Example

Monitor pages for content changes
Save snapshots over time
Show diffs when changes detected
Maintain complete history

# Add a page to monitor
python content_monitor.py add https://example.com/page --name "Example Page"

# Check all pages for changes
python content_monitor.py check

# List monitored pages
python content_monitor.py list

# View change history
python content_monitor.py history "example-page" --limit 20

# Remove a page
python content_monitor.py remove "example-page"

# Add pages to monitor
python content_monitor.py add https://competitor.com/pricing --name "Competitor Pricing"
python content_monitor.py add https://example.com/changelog --name "Example Changelog"

# Check periodically (can be scheduled)
python content_monitor.py check

# Output when change detected:
# ✅ Competitor Pricing: CHANGED
#    - 5 lines added
#    - 2 lines removed

# View history
python content_monitor.py history "competitor-pricing"

Installation

Clone or Download

cd "C:\Users\ulilj\Firecrawl Projects\tools"

Verify Firecrawl CLI

firecrawl --status

Or use full path:

& 'C:\Users\ulilj\Firecrawl Projects\firecrawl-cli\.venv\Scripts\firecrawl.exe' --status

Test a Tool

python deep_research.py "test topic" --limit 2

Common Patterns

Chaining Tools

Combine tools for powerful workflows:

# Research competitors, then monitor them
from deep_research import search_topic
from content_monitor import add_page

results = search_topic("AI startups 2025", limit=10)
for result in results:
    add_page(result["url"], result["title"])

Scheduled Monitoring

Windows Task Scheduler:

Open Task Scheduler
Create Basic Task
Set trigger (e.g., daily at 9 AM)
Action: Start a program
- Program: python
- Arguments: content_monitor.py check
- Start in: C:\Users\ulilj\Firecrawl Projects\tools

Linux Cron:

# Run every day at 9 AM
0 9 * * * cd /path/to/tools && python content_monitor.py check

Batch Lead Extraction

# From a file of URLs
while IFS= read -r url; do
  python lead_extractor.py "$url" --format json --output "leads_$(date +%Y%m%d).json"
done < companies.txt

Output Formats

Markdown
JSON
CSV

# Research Report: AI Trends 2025

Generated: 2026-03-05 10:30:00
Sources: 5

## Key Findings

1. **Trend Name**
   Source: [Example.com](https://example.com)
   
   Summary of the trend...

## Sources

- [Title](URL)

{
  "company": "Stripe",
  "website": "stripe.com",
  "emails": ["[email protected]"],
  "phones": ["+1-888-926-2289"],
  "social": {
    "linkedin": "linkedin.com/company/stripe",
    "twitter": "twitter.com/stripe"
  },
  "scraped_at": "2026-03-05T10:30:00"
}

Company,Email,Phone,LinkedIn,Twitter,Website
Stripe,[email protected],+1-888-926-2289,linkedin.com/company/stripe,twitter.com/stripe,stripe.com
Twilio,[email protected],+1-888-946-5469,linkedin.com/company/twilio,twitter.com/twilio,twilio.com

Firecrawl CLI Integration

All tools use the Firecrawl CLI with these key features:

--only-main-content

Extracts only the main content, removing navigation and ads

--format markdown

Returns clean markdown instead of HTML

--wait-for

Waits for JavaScript to render before scraping

Rate Limiting

Respects API rate limits automatically

Example CLI Calls

# Scrape single page
firecrawl scrape https://example.com --only-main-content --format markdown

# Map entire site
firecrawl map https://example.com --limit 100

# Crawl multiple pages
firecrawl crawl https://example.com --limit 50 --only-main-content

Tips & Best Practices

Start with Small Limits

Test with --limit 5 or --limit 10 to verify output before scaling up

Use Descriptive Filenames

Always specify --output with meaningful names for easier organization

Check _manifest Files

Tools that scrape multiple pages create _manifest.json - review for errors

Respect Rate Limits

Space out large scraping jobs. The CLI handles rate limiting but be considerate.

Organize Output

Create dated directories:

mkdir reports/$(date +%Y-%m-%d)
python deep_research.py "topic" --output reports/$(date +%Y-%m-%d)/research.md

Always respect robots.txt and terms of service. Some sites prohibit automated scraping.

Troubleshooting

Firecrawl CLI not found

Use full path:

$FIRECRAWL = "C:\Users\ulilj\Firecrawl Projects\firecrawl-cli\.venv\Scripts\firecrawl.exe"
& $FIRECRAWL --status

Or add to PATH in Windows:

System Properties → Environment Variables
Edit PATH
Add: C:\Users\ulilj\Firecrawl Projects\firecrawl-cli\.venv\Scripts

API key errors

Reconfigure:

firecrawl config set-api-key YOUR_API_KEY

Verify:

firecrawl config show

No results returned

Check if the URL is accessible
Try scraping manually first: firecrawl scrape URL
Increase wait time for JS-heavy sites
Verify API quota hasn’t been exceeded

Content Monitor not detecting changes

Ensure the page content actually changed
Some sites use dynamic timestamps that always show as changed
Check .content_monitor/ directory for stored snapshots

Use Cases

Market Research

Track competitor pricing, features, and messaging changes over time

Sales Intelligence

Extract contact information for lead generation and outreach

Documentation Archive

Backup documentation before major version updates

Content Strategy

Analyze competitor content and identify gaps in your own

SEO Research

Research what content ranks well for your target keywords

Due Diligence

Gather comprehensive information about companies for investment research

Featured Apps

Tools & Utilities

Experiments

​Firecrawl Projects

​Overview

Deep Research

Competitor Analysis

Documentation Scraper

Lead Extractor

Content Monitor

​Prerequisites

​Tools

​1. Deep Research

​2. Competitor Analysis

​3. Documentation Scraper

​4. Lead Extractor

​5. Content Monitor

​Installation

​Common Patterns

​Chaining Tools

​Scheduled Monitoring

​Batch Lead Extraction

​Output Formats

​Firecrawl CLI Integration

--only-main-content

--format markdown

--wait-for

Rate Limiting

​Example CLI Calls

​Tips & Best Practices

​Troubleshooting

​Use Cases

Market Research

Sales Intelligence

Documentation Archive

Content Strategy

SEO Research

Due Diligence

Get Started with Firecrawl

Build docs developers (and LLMs) love

Firecrawl Projects

Overview

Prerequisites

Tools

1. Deep Research

2. Competitor Analysis

3. Documentation Scraper

4. Lead Extractor

5. Content Monitor

Installation

Common Patterns

Chaining Tools

Scheduled Monitoring

Batch Lead Extraction

Output Formats

Firecrawl CLI Integration

Example CLI Calls

Tips & Best Practices

Troubleshooting

Use Cases