Data Extraction

Overview

Stagehand’s extract() method allows you to pull structured data from web pages using natural language instructions and Zod schemas for type safety.

Basic Extraction

Here’s a simple example that extracts data from a page:

import { Stagehand } from "@stagehand/core";

async function example(stagehand: Stagehand) {
  const page = stagehand.context.pages()[0];
  await page.goto(
    "https://browserbase.github.io/stagehand-eval-sites/sites/iframe-hn/",
  );

  const { extraction } = await stagehand.extract(
    "grab the the first title from inside the iframe",
  );
  console.log(extraction);
}

(async () => {
  const stagehand = new Stagehand({
    env: "BROWSERBASE",
    apiKey: process.env.BROWSERBASE_API_KEY,
    projectId: process.env.BROWSERBASE_PROJECT_ID,
    verbose: 2,
  });
  try {
    await stagehand.init();
    await example(stagehand);
  } finally {
    await stagehand.close();
  }
})();

Structured Extraction with Zod

Use Zod schemas to get strongly-typed extraction results:

import { Stagehand } from "@stagehand/core";
import { z } from "zod";

async function example() {
  const stagehand = new Stagehand({
    env: "LOCAL",
    verbose: 1,
  });

  await stagehand.init();
  const page = stagehand.context.pages()[0];
  
  try {
    await page.goto("https://ovolve.github.io/2048-AI/");
    
    // Extract game state with a typed schema
    const gameState = await stagehand.extract(
      `Extract the current game state:
        1. Score from the score counter
        2. All tile values in the 4x4 grid (empty spaces as 0)
        3. Highest tile value present`,
      z.object({
        score: z.number(),
        highestTile: z.number(),
        grid: z.array(z.array(z.number())),
      }),
    );
    
    console.log("Game State:", {
      score: gameState.score,
      highestTile: gameState.highestTile,
      grid: gameState.grid,
    });
  } catch (error) {
    console.error("Error extracting data:", error);
  }
}

(async () => {
  await example();
})();

Multi-Page Extraction

You can work with multiple pages and extract data from each:

import { Stagehand } from "@stagehand/core";

async function example(stagehand: Stagehand) {
  const page = stagehand.context.pages()[0];
  await page.goto(
    "https://browserbase.github.io/stagehand-eval-sites/sites/iframe-hn/",
  );

  const { extraction } = await stagehand.extract(
    "grab the the first title from inside the iframe",
  );
  console.log(extraction);

  // Create a second page
  const page2 = await stagehand.context.newPage();
  await page2.goto(
    "https://browserbase.github.io/stagehand-eval-sites/sites/iframe-same-proc/",
  );
  
  // Extract from the second page
  await stagehand.extract(
    "extract the placeholder text on the your name field",
    { page: page2 },
  );
}

(async () => {
  const stagehand = new Stagehand({
    env: "BROWSERBASE",
    apiKey: process.env.BROWSERBASE_API_KEY,
    projectId: process.env.BROWSERBASE_PROJECT_ID,
    verbose: 2,
  });
  try {
    await stagehand.init();
    await example(stagehand);
  } finally {
    await stagehand.close();
  }
})();

Complex Data Structures

Extract nested and complex data structures:

import { Stagehand } from "@stagehand/core";
import { z } from "zod";

const productSchema = z.object({
  name: z.string(),
  price: z.number(),
  inStock: z.boolean(),
  rating: z.number().optional(),
  reviews: z.array(z.object({
    author: z.string(),
    text: z.string(),
    stars: z.number(),
  })).optional(),
});

async function extractProductData() {
  const stagehand = new Stagehand({
    env: "BROWSERBASE",
    verbose: 1,
  });
  
  await stagehand.init();
  const page = stagehand.context.pages()[0];
  
  await page.goto("https://example-store.com/product/123");
  
  const product = await stagehand.extract(
    "Extract all product information including name, price, availability, rating, and recent reviews",
    productSchema,
  );
  
  console.log("Product:", product);
  
  await stagehand.close();
}

Key Concepts

Natural Language Instructions

Describe what data you want to extract in plain English. Stagehand will use AI to understand and locate the information.

Zod Schema Validation

Define the structure of your expected data with Zod schemas. This provides:

Type safety
Runtime validation
Auto-completion in TypeScript
Clear data contracts

Page Context

When working with multiple pages, specify which page to extract from using the page option.

Best Practices

Be specific - Clear instructions yield better results
Use schemas - Always define Zod schemas for structured data
Handle errors - Extraction can fail if elements aren’t found
Wait for content - Ensure dynamic content is loaded before extracting

Next Steps

Learn about form filling to input data
See multi-step automation for complex workflows
Explore web navigation patterns

Basic Examples

Advanced Examples

Overview

Basic Extraction

Structured Extraction with Zod

Multi-Page Extraction

Complex Data Structures

Key Concepts

Natural Language Instructions

Zod Schema Validation

Page Context

Best Practices

Next Steps

Build docs developers (and LLMs) love

Basic Examples

Advanced Examples

​Overview

​Basic Extraction

​Structured Extraction with Zod

​Multi-Page Extraction

​Complex Data Structures

​Key Concepts

​Natural Language Instructions

​Zod Schema Validation

​Page Context

​Best Practices

​Next Steps

Build docs developers (and LLMs) love

Overview

Basic Extraction

Structured Extraction with Zod

Multi-Page Extraction

Complex Data Structures

Key Concepts

Natural Language Instructions

Zod Schema Validation

Page Context

Best Practices

Next Steps