Documentation Scraping

Polaris integrates Firecrawl to scrape and extract content from documentation URLs. When you share a link in a Quick Edit instruction or AI conversation, the AI automatically fetches the page, converts it to markdown, and uses it as context for generating better, framework-specific code.

How It Works

When you include a URL in your request:

URL detection

The system detects HTTP/HTTPS URLs using regex pattern matching.

Firecrawl scraping

Each URL is sent to Firecrawl’s scrape API with formats: ["markdown"] to extract clean, readable content.

Context injection

The scraped markdown is injected into the AI’s prompt with the original URL for reference.

AI generation

The AI uses the documentation to generate code that follows the patterns and conventions from the docs.

Documentation scraping works in both Quick Edit (Cmd+K) and AI Conversations.

Use Cases

Framework-Specific Code

Teach the AI how to use specific framework features:

User: Convert this to a Next.js Server Action using https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions

AI: *scrapes Next.js docs*
    *learns Server Action pattern*
    *generates compliant code*

API Integration

Provide API documentation for accurate integration:

User: Add Stripe checkout following https://docs.stripe.com/payments/checkout/how-checkout-works

AI: *scrapes Stripe docs*
    *understands the flow*
    *creates checkout with correct parameters*

Library Usage

Reference specific library methods and patterns:

User: Use Zod validation based on https://zod.dev/?id=basic-usage

AI: *scrapes Zod documentation*
    *applies correct schema syntax*
    *generates type-safe validation*

UI Component Libraries

Implement components following design system guidelines:

User: Create a button using https://ui.shadcn.com/docs/components/button

AI: *scrapes shadcn/ui docs*
    *follows component patterns*
    *includes proper imports and variants*

Quick Edit with Documentation

In Quick Edit mode (Cmd+K on selected code), URLs in your instruction are automatically scraped:

// Selected code:
const response = await fetch('/api/data');

// Instruction:
// "add error handling using https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch#checking_that_the_fetch_was_successful"

// Result (uses MDN patterns):
const response = await fetch('/api/data');
if (!response.ok) {
  throw new Error(`HTTP error! status: ${response.status}`);
}
const data = await response.json();

The implementation is in /src/app/api/quick-edit/route.ts:69:

const URL_REGEX = /https?:\/\/[^\s)>\]]+/g;
const urls: string[] = instruction.match(URL_REGEX) || [];

if (urls.length > 0) {
  const scrapedResults = await Promise.all(
    urls.map(async (url) => {
      const result = await firecrawl.scrape(url, {
        formats: ["markdown"],
      });
      return result.markdown;
    })
  );
  // Inject into AI prompt
}

Conversations with Documentation

The AI conversation system has a dedicated scrapeUrls tool that can fetch multiple URLs:

scrapeUrls({
  urls: [
    "https://docs.stripe.com/api/checkout/sessions",
    "https://docs.stripe.com/webhooks"
  ]
})

This allows the AI to gather comprehensive documentation before writing code.

Example Conversation

User: Build a subscription system using Stripe. Reference:
- https://docs.stripe.com/billing/subscriptions/overview
- https://docs.stripe.com/billing/subscriptions/build-subscriptions

AI:
1. Scrapes both Stripe documentation URLs
2. Reads your current project structure
3. Creates subscription management files following Stripe patterns
4. Implements webhook handlers for subscription events
5. Responds with setup instructions

Supported Documentation Types

Firecrawl can extract content from:

Official Docs
API References
GitHub READMEs
Tutorials & Guides

Framework and library documentation sites:

Next.js: nextjs.org/docs/*
React: react.dev/*
TypeScript: typescriptlang.org/docs/*
Tailwind: tailwindcss.com/docs/*

API documentation and references:

Stripe: docs.stripe.com/*
OpenAI: platform.openai.com/docs/*
Anthropic: docs.anthropic.com/*
GitHub: docs.github.com/*

Repository documentation:

README files: github.com/user/repo#readme
Wiki pages: github.com/user/repo/wiki/*
Markdown files: github.com/user/repo/blob/main/docs/*

Implementation Details

Quick Edit Scraping

From /src/app/api/quick-edit/route.ts:17:

const URL_REGEX = /https?:\/\/[^\s)>\]]+/g;

const urls: string[] = instruction.match(URL_REGEX) || [];
let documentationContext = "";

if (urls.length > 0) {
  const scrapedResults = await Promise.all(
    urls.map(async (url) => {
      try {
        const result = await firecrawl.scrape(url, {
          formats: ["markdown"],
        });
        if (result.markdown) {
          return `<doc url="${url}">\n${result.markdown}\n</doc>`;
        }
        return null;
      } catch {
        return null;
      }
    })
  );
  
  const validResults = scrapedResults.filter(Boolean);
  if (validResults.length > 0) {
    documentationContext = `<documentation>\n${validResults.join("\n\n")}\n</documentation>`;
  }
}

Conversation Scraping Tool

From /src/features/conversations/inngest/tools/scrape-urls.ts:11:

export const createScrapeUrlsTool = () => {
  return createTool({
    name: "scrapeUrls",
    description:
      "Scrape content from URLs to get documentation or reference material. " +
      "Use this when the user provides URLs or references external documentation. " +
      "Returns markdown content from the scraped pages.",
    parameters: z.object({
      urls: z.array(z.string()).describe("Array of URLs to scrape for content"),
    }),
    handler: async (params, { step: toolStep }) => {
      const results: { url: string; content: string }[] = [];
      
      for (const url of urls) {
        const result = await firecrawl.scrape(url, {
          formats: ["markdown"],
        });
        if (result.markdown) {
          results.push({ url, content: result.markdown });
        }
      }
      
      return JSON.stringify(results);
    }
  });
};

Error Handling

If a URL fails to scrape:

Quick Edit continues with whatever documentation was successfully fetched
Conversation tool returns “Failed to scrape URL: [url]” for that specific URL
Other URLs in the same request are still processed
The AI proceeds with available context

Common Scraping Failures

JavaScript-heavy sites - Some sites require JS rendering (Firecrawl handles most cases)
Rate limiting - Too many requests to the same domain may be blocked
Paywalled content - Content behind authentication can’t be scraped
Malformed URLs - Invalid URLs are skipped silently

Best Practices

Specific Pages
Multiple Sources
Relevant Sections
Up-to-Date Docs

Link to specific documentation pages, not homepages:Good:

https://nextjs.org/docs/app/api-reference/file-conventions/route

Bad:

https://nextjs.org

Provide multiple URLs for comprehensive context:

Create a payment flow using:
- https://docs.stripe.com/payments/accept-a-payment
- https://docs.stripe.com/payments/payment-intents
- https://docs.stripe.com/webhooks/quickstart

Use anchor links to jump to specific sections:

https://react.dev/reference/react/useState#usage

(Though Firecrawl scrapes the full page, the AI can focus on relevant parts)

Configuration

Setting	Value	Location
URL regex pattern	`/https?:\/\/[^\s)>\]]+/g`	`route.ts:17`
Firecrawl format	`["markdown"]`	`route.ts:77`, `scrape-urls.ts:33`
Timeout	Inherits from Firecrawl SDK	-
Max URLs per request	Unlimited (all URLs in text)	-

Firecrawl Setup

Polaris uses the Firecrawl SDK initialized in /src/lib/firecrawl.ts:

import Firecrawl from '@mendable/firecrawl-js';

export const firecrawl = new Firecrawl({
  apiKey: process.env.FIRECRAWL_API_KEY
});

You need a FIRECRAWL_API_KEY environment variable for documentation scraping to work. Get an API key from firecrawl.dev.

Example Prompts

Quick Edit Examples

"add Zod validation following https://zod.dev"

"use React Query patterns from https://tanstack.com/query/latest/docs/framework/react/overview"

"implement auth with https://next-auth.js.org/getting-started/example"

"style with Tailwind using https://tailwindcss.com/docs/utility-first"

Conversation Examples

"Build a file upload component using https://uploadthing.com/docs"

"Create API routes following https://nextjs.org/docs/app/api-reference/file-conventions/route"

"Add database queries using https://orm.drizzle.team/docs/sql-schema-declaration"

"Implement real-time features with https://docs.convex.dev/functions"

Tips for Best Results

Include URLs in natural language - “using [URL]” or “following [URL]” works well
Multiple related URLs - Provide 2-3 related documentation pages for comprehensive context
Check scraped content quality - Some sites format better as markdown than others
Prefer official docs - Official documentation is more reliable than third-party tutorials
Use anchor links - Link to specific sections when possible

Quick Edit - Transform code with documentation-aware editing
Conversations - Chat with AI using the scrapeUrls tool
Ghost Text Suggestions - Auto-complete suggestions (no URL scraping)

Get Started

Core Features

AI Capabilities

Advanced

Documentation Scraping

How It Works

Use Cases

Framework-Specific Code

API Integration

Library Usage

UI Component Libraries

Quick Edit with Documentation

Conversations with Documentation

Example Conversation

Supported Documentation Types

Implementation Details

Quick Edit Scraping

Conversation Scraping Tool

Error Handling

Common Scraping Failures

Best Practices

Configuration

Firecrawl Setup

Example Prompts

Quick Edit Examples

Conversation Examples

Tips for Best Results

Build docs developers (and LLMs) love

Get Started

Core Features

AI Capabilities

Advanced

​How It Works

​Use Cases

​Framework-Specific Code

​API Integration

​Library Usage

​UI Component Libraries

​Quick Edit with Documentation

​Conversations with Documentation

​Example Conversation

​Supported Documentation Types

​Implementation Details

​Quick Edit Scraping

​Conversation Scraping Tool

​Error Handling

​Common Scraping Failures

​Best Practices

​Configuration

​Firecrawl Setup

​Example Prompts

​Quick Edit Examples

​Conversation Examples

​Tips for Best Results

​Related Features

Build docs developers (and LLMs) love

How It Works

Use Cases

Framework-Specific Code

API Integration

Library Usage

UI Component Libraries

Quick Edit with Documentation

Conversations with Documentation

Example Conversation

Supported Documentation Types

Implementation Details

Quick Edit Scraping

Conversation Scraping Tool

Error Handling

Common Scraping Failures

Best Practices

Configuration

Firecrawl Setup

Example Prompts

Quick Edit Examples

Conversation Examples

Tips for Best Results

Related Features