extract()

Overview

The extract() method allows you to extract structured data from web pages using natural language instructions and Zod schemas. It leverages AI to understand page content and return data in the exact format you need.

Method Signature

extract<T extends StagehandZodSchema>(
  instruction?: string,
  schema?: T,
  options?: ExtractOptions
): Promise<InferStagehandSchema<T>>

Parameters

instruction

string

Natural language description of what data to extract (e.g., “Extract all product listings with their prices”). Optional when no schema is provided - returns page text.

schema

StagehandZodSchema

Zod schema defining the structure of data to extract. Supports z.object(), z.array(), and nested schemas.

import { z } from "zod";

const schema = z.object({
  title: z.string().describe("Page title"),
  price: z.string().describe("Product price"),
});

options

ExtractOptions

Optional configuration for extraction.

Show properties

model

ModelConfiguration

Override the default model for this specific extraction.

timeout

number

Maximum time in milliseconds to wait for extraction. Throws ExtractTimeoutError if exceeded.

selector

string

Focus extraction on a specific part of the page. Accepts CSS selectors or XPath (prefix with xpath=).

page

Page

Specific page to extract from (useful for multi-page scenarios).

Return Value

Returns a Promise that resolves to data matching your Zod schema structure.

With schema: Returns typed data matching the schema
Without schema: Returns { extraction: string } or { pageText: string }

Usage Examples

Basic Extraction

import { Stagehand } from "@stagehand/api";
import { z } from "zod";

const stagehand = new Stagehand({
  env: "BROWSERBASE",
  apiKey: process.env.BROWSERBASE_API_KEY,
});

await stagehand.init();
const page = stagehand.context.pages()[0];

await page.goto("https://news.ycombinator.com");

const articles = await stagehand.extract(
  "Extract the top 5 article titles",
  z.object({
    titles: z.array(z.string()),
  })
);

console.log(articles.titles);

Extracting Lists

await page.goto("https://www.apartments.com/san-francisco-ca/");

const listings = await stagehand.extract(
  "Extract all apartment listings with prices and addresses",
  z.object({
    listings: z.array(
      z.object({
        price: z.string().describe("The price of the listing"),
        address: z.string().describe("The address of the listing"),
      })
    ),
  })
);

console.log(`Found ${listings.listings.length} apartments`);
listings.listings.forEach((listing) => {
  console.log(`${listing.address}: ${listing.price}`);
});

Nested Data Structures

const productData = await stagehand.extract(
  "Extract product information",
  z.object({
    product: z.object({
      name: z.string(),
      price: z.string(),
      features: z.array(z.string()),
      reviews: z.object({
        rating: z.number(),
        count: z.number(),
        topReview: z.string(),
      }),
    }),
  })
);

console.log(productData.product.name);
console.log(`Rating: ${productData.product.reviews.rating}/5`);

Extracting URLs

// Zod's .url() fields are automatically converted to clickable URLs
const links = await stagehand.extract(
  "Get all navigation links",
  z.object({
    links: z.array(
      z.object({
        text: z.string(),
        url: z.string().url(), // Automatically extracts href attribute
      })
    ),
  })
);

for (const link of links.links) {
  console.log(`${link.text}: ${link.url}`);
}

Focused Extraction

// Extract from a specific section of the page
const sidebarData = await stagehand.extract(
  "Extract trending topics",
  z.object({
    topics: z.array(z.string()),
  }),
  {
    selector: "aside.sidebar", // CSS selector
  }
);

// Or use XPath
const contentData = await stagehand.extract(
  "Extract main content",
  schema,
  {
    selector: "xpath=//main[@id='content']",
  }
);

No-Schema Extraction

// Without instruction or schema - returns page text
const { pageText } = await stagehand.extract();
console.log(pageText);

// With instruction only - returns free-form extraction
const { extraction } = await stagehand.extract(
  "What is the main topic of this page?"
);
console.log(extraction);

Multi-Page Extraction

const page1 = stagehand.context.pages()[0];
const page2 = await stagehand.context.newPage();

await page1.goto("https://example.com/page1");
await page2.goto("https://example.com/page2");

const data1 = await stagehand.extract(
  "Extract title",
  z.object({ title: z.string() }),
  { page: page1 }
);

const data2 = await stagehand.extract(
  "Extract title",
  z.object({ title: z.string() }),
  { page: page2 }
);

Using Descriptions

// Add .describe() to help the AI understand what to extract
const userData = await stagehand.extract(
  "Extract user profile information",
  z.object({
    username: z.string().describe("The user's display name"),
    email: z.string().describe("The user's email address"),
    joinDate: z.string().describe("Date the user joined, in MM/DD/YYYY format"),
    isVerified: z.boolean().describe("Whether the user's account is verified"),
  })
);

Handling Missing Data

// Use .optional() for fields that might not exist
const result = await stagehand.extract(
  "Extract article metadata",
  z.object({
    title: z.string(),
    author: z.string().optional(),
    publishDate: z.string().optional(),
    readTime: z.string().optional(),
  })
);

if (result.author) {
  console.log(`By ${result.author}`);
}

With Timeout

try {
  const data = await stagehand.extract(
    "Extract complex data",
    schema,
    { timeout: 30000 } // 30 seconds
  );
} catch (error) {
  if (error instanceof ExtractTimeoutError) {
    console.error("Extraction timed out");
  }
}

Supported Schema Types

Stagehand’s extract() supports most Zod schema types:

Primitives: z.string(), z.number(), z.boolean()
Objects: z.object({ ... })
Arrays: z.array(...)
Optionals: .optional()
Nested structures: Objects within objects, arrays of objects
URLs: z.string().url() - automatically extracts href attributes
Descriptions: .describe("...") - helps guide extraction

How It Works

Snapshot: Captures an accessibility tree of the page
LLM Processing: Sends the instruction and schema to the AI model
Extraction: AI identifies and extracts matching data
Validation: Data is validated against your Zod schema
Return: Typed data matching your schema structure

Performance Tips

Use focused selectors - Extract from specific page sections

await stagehand.extract(instruction, schema, {
  selector: ".product-details"
});

Be specific with descriptions - Help the AI understand context
```
z.string().describe("The product price in USD format")
```

Use appropriate schemas - Don’t over-complicate structure

// Good - simple and clear
z.object({ price: z.string() })

// Overkill - unnecessary complexity
z.object({ 
  price: z.object({ 
    amount: z.string(), 
    currency: z.string() 
  })
})

Error Handling

try {
  const data = await stagehand.extract(instruction, schema);
  console.log(data);
} catch (error) {
  if (error instanceof ExtractTimeoutError) {
    console.error("Extraction timed out");
  } else if (error instanceof StagehandInvalidArgumentError) {
    console.error("Invalid schema or instruction");
  } else {
    console.error("Extraction failed:", error);
  }
}

Best Practices

Clear instructions - Be explicit about what to extract
Use descriptions - Add .describe() to schema fields
Handle optionals - Use .optional() for fields that may not exist
Focus extraction - Use selector option for large pages
Type safety - Let TypeScript infer types from your schema

// TypeScript automatically knows the structure
const result = await stagehand.extract(
  "Extract data",
  z.object({
    title: z.string(),
    count: z.number(),
  })
);

// result.title is string
// result.count is number

act() - Perform actions on the page
observe() - Preview actions before executing
agent() - Autonomous multi-step automation

Getting Started

Core Concepts

Core Methods

Configuration

Integrations

Best Practices

Advanced Features

Overview

Method Signature

Parameters

Return Value

Usage Examples

Basic Extraction

Extracting Lists

Nested Data Structures

Extracting URLs

Focused Extraction

No-Schema Extraction

Multi-Page Extraction

Using Descriptions

Handling Missing Data

With Timeout

Supported Schema Types

How It Works

Performance Tips

Error Handling

Best Practices

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Core Methods

Configuration

Integrations

Best Practices

Advanced Features

​Overview

​Method Signature

​Parameters

​Return Value

​Usage Examples

​Basic Extraction

​Extracting Lists

​Nested Data Structures

​Extracting URLs

​Focused Extraction

​No-Schema Extraction

​Multi-Page Extraction

​Using Descriptions

​Handling Missing Data

​With Timeout

​Supported Schema Types

​How It Works

​Performance Tips

​Error Handling

​Best Practices

​Related Methods

Build docs developers (and LLMs) love

Overview

Method Signature

Parameters

Return Value

Usage Examples

Basic Extraction

Extracting Lists

Nested Data Structures

Extracting URLs

Focused Extraction

No-Schema Extraction

Multi-Page Extraction

Using Descriptions

Handling Missing Data

With Timeout

Supported Schema Types

How It Works

Performance Tips

Error Handling

Best Practices

Related Methods