Skip to main content

Overview

Stagehand’s extract() method allows you to pull structured data from web pages using natural language instructions and Zod schemas for type safety.

Basic Extraction

Here’s a simple example that extracts data from a page:
import { Stagehand } from "@stagehand/core";

async function example(stagehand: Stagehand) {
  const page = stagehand.context.pages()[0];
  await page.goto(
    "https://browserbase.github.io/stagehand-eval-sites/sites/iframe-hn/",
  );

  const { extraction } = await stagehand.extract(
    "grab the the first title from inside the iframe",
  );
  console.log(extraction);
}

(async () => {
  const stagehand = new Stagehand({
    env: "BROWSERBASE",
    apiKey: process.env.BROWSERBASE_API_KEY,
    projectId: process.env.BROWSERBASE_PROJECT_ID,
    verbose: 2,
  });
  try {
    await stagehand.init();
    await example(stagehand);
  } finally {
    await stagehand.close();
  }
})();

Structured Extraction with Zod

Use Zod schemas to get strongly-typed extraction results:
import { Stagehand } from "@stagehand/core";
import { z } from "zod";

async function example() {
  const stagehand = new Stagehand({
    env: "LOCAL",
    verbose: 1,
  });

  await stagehand.init();
  const page = stagehand.context.pages()[0];
  
  try {
    await page.goto("https://ovolve.github.io/2048-AI/");
    
    // Extract game state with a typed schema
    const gameState = await stagehand.extract(
      `Extract the current game state:
        1. Score from the score counter
        2. All tile values in the 4x4 grid (empty spaces as 0)
        3. Highest tile value present`,
      z.object({
        score: z.number(),
        highestTile: z.number(),
        grid: z.array(z.array(z.number())),
      }),
    );
    
    console.log("Game State:", {
      score: gameState.score,
      highestTile: gameState.highestTile,
      grid: gameState.grid,
    });
  } catch (error) {
    console.error("Error extracting data:", error);
  }
}

(async () => {
  await example();
})();

Multi-Page Extraction

You can work with multiple pages and extract data from each:
import { Stagehand } from "@stagehand/core";

async function example(stagehand: Stagehand) {
  const page = stagehand.context.pages()[0];
  await page.goto(
    "https://browserbase.github.io/stagehand-eval-sites/sites/iframe-hn/",
  );

  const { extraction } = await stagehand.extract(
    "grab the the first title from inside the iframe",
  );
  console.log(extraction);

  // Create a second page
  const page2 = await stagehand.context.newPage();
  await page2.goto(
    "https://browserbase.github.io/stagehand-eval-sites/sites/iframe-same-proc/",
  );
  
  // Extract from the second page
  await stagehand.extract(
    "extract the placeholder text on the your name field",
    { page: page2 },
  );
}

(async () => {
  const stagehand = new Stagehand({
    env: "BROWSERBASE",
    apiKey: process.env.BROWSERBASE_API_KEY,
    projectId: process.env.BROWSERBASE_PROJECT_ID,
    verbose: 2,
  });
  try {
    await stagehand.init();
    await example(stagehand);
  } finally {
    await stagehand.close();
  }
})();

Complex Data Structures

Extract nested and complex data structures:
import { Stagehand } from "@stagehand/core";
import { z } from "zod";

const productSchema = z.object({
  name: z.string(),
  price: z.number(),
  inStock: z.boolean(),
  rating: z.number().optional(),
  reviews: z.array(z.object({
    author: z.string(),
    text: z.string(),
    stars: z.number(),
  })).optional(),
});

async function extractProductData() {
  const stagehand = new Stagehand({
    env: "BROWSERBASE",
    verbose: 1,
  });
  
  await stagehand.init();
  const page = stagehand.context.pages()[0];
  
  await page.goto("https://example-store.com/product/123");
  
  const product = await stagehand.extract(
    "Extract all product information including name, price, availability, rating, and recent reviews",
    productSchema,
  );
  
  console.log("Product:", product);
  
  await stagehand.close();
}

Key Concepts

Natural Language Instructions

Describe what data you want to extract in plain English. Stagehand will use AI to understand and locate the information.

Zod Schema Validation

Define the structure of your expected data with Zod schemas. This provides:
  • Type safety
  • Runtime validation
  • Auto-completion in TypeScript
  • Clear data contracts

Page Context

When working with multiple pages, specify which page to extract from using the page option.

Best Practices

  1. Be specific - Clear instructions yield better results
  2. Use schemas - Always define Zod schemas for structured data
  3. Handle errors - Extraction can fail if elements aren’t found
  4. Wait for content - Ensure dynamic content is loaded before extracting

Next Steps

Build docs developers (and LLMs) love