Skip to main content

Overview

The extract() method extracts structured data from the current page using AI. It can return page text, answer questions, or extract data into custom schemas.

Syntax

// Get page text
const { pageText } = await stagehand.extract();

// Answer a question
const { extraction } = await stagehand.extract("What is the article about?");

// Extract with custom schema
const data = await stagehand.extract(instruction, schema, options?);

Overloads

1. Extract Page Text

await stagehand.extract();
await stagehand.extract(options?);
returns
Promise<{ pageText: string }>
Object containing the full page text

2. Extract with Instruction (Default Schema)

await stagehand.extract(instruction, options?);
instruction
string
required
Question or description of what to extract
returns
Promise<{ extraction: string }>
Object containing the extracted string

3. Extract with Custom Schema

await stagehand.extract(instruction, schema, options?);
instruction
string
required
Description of what to extract
schema
StagehandZodSchema
required
Zod schema defining the structure of extracted dataExample:
import { z } from "zod";

const schema = z.object({
  title: z.string(),
  price: z.string(),
  inStock: z.boolean(),
});
returns
Promise<T>
Extracted data matching the schema type

Options

options
ExtractOptions

Examples

Extract Page Text

import { Stagehand } from "@browserbasehq/stagehand";

const stagehand = new Stagehand({ env: "LOCAL" });
await stagehand.init();

const page = await stagehand.context.newPage();
await page.goto("https://example.com");

// Get all text content
const { pageText } = await stagehand.extract();
console.log(pageText);

await stagehand.close();

Answer Questions

await page.goto("https://news.ycombinator.com");

// Extract specific information
const { extraction } = await stagehand.extract(
  "What is the title of the top story?"
);

console.log(extraction); // "New AI Framework Released"

Extract Structured Data

import { z } from "zod";

await page.goto("https://example-shop.com/product/123");

// Define schema
const productSchema = z.object({
  name: z.string(),
  price: z.string(),
  description: z.string(),
  inStock: z.boolean(),
  rating: z.number().optional(),
});

// Extract data
const product = await stagehand.extract(
  "Extract the product details",
  productSchema
);

console.log(product);
// {
//   name: "Wireless Mouse",
//   price: "$29.99",
//   description: "Ergonomic wireless mouse with...",
//   inStock: true,
//   rating: 4.5
// }

Extract Lists

import { z } from "zod";

await page.goto("https://example.com/articles");

const articleListSchema = z.object({
  articles: z.array(
    z.object({
      title: z.string(),
      author: z.string(),
      date: z.string(),
      summary: z.string().optional(),
    })
  ),
});

const { articles } = await stagehand.extract(
  "Extract all articles from the page",
  articleListSchema
);

console.log(`Found ${articles.length} articles`);

Scoped Extraction

// Extract only from specific section
const headerData = await stagehand.extract(
  "Get the navigation links",
  z.object({
    links: z.array(z.object({ text: z.string(), url: z.string() })),
  }),
  { selector: "header nav" }
);

Complex Schema with Descriptions

import { z } from "zod";

const jobSchema = z.object({
  title: z.string().describe("Job title"),
  company: z.string().describe("Company name"),
  location: z.string().describe("Job location"),
  salary: z
    .string()
    .optional()
    .describe("Salary range if available"),
  remote: z.boolean().describe("Whether the job is remote"),
  requirements: z
    .array(z.string())
    .describe("List of job requirements"),
});

await page.goto("https://jobs.example.com/posting/123");

const job = await stagehand.extract(
  "Extract the job posting details",
  jobSchema
);

Extract with Custom Model

// Use different model for extraction
const data = await stagehand.extract(
  "Extract contact information",
  contactSchema,
  {
    model: "anthropic/claude-3-5-sonnet-latest",
  }
);

Extract from Multiple Pages

const page1 = await stagehand.context.newPage();
const page2 = await stagehand.context.newPage();

await page1.goto("https://example.com/page1");
await page2.goto("https://example.com/page2");

// Extract from specific pages
const data1 = await stagehand.extract("Get the title", schema, { page: page1 });
const data2 = await stagehand.extract("Get the title", schema, { page: page2 });

Real-World Examples

E-commerce Product

const productSchema = z.object({
  product: z.object({
    name: z.string(),
    brand: z.string(),
    price: z.object({
      current: z.string(),
      original: z.string().optional(),
      currency: z.string(),
    }),
    availability: z.enum(["in_stock", "out_of_stock", "pre_order"]),
    images: z.array(z.string().url()),
    specifications: z.record(z.string(), z.string()),
    reviews: z.object({
      averageRating: z.number(),
      totalReviews: z.number(),
    }).optional(),
  }),
});

const data = await stagehand.extract(
  "Extract complete product information",
  productSchema
);

News Articles

const newsSchema = z.object({
  article: z.object({
    headline: z.string(),
    subheading: z.string().optional(),
    author: z.string(),
    publishDate: z.string(),
    content: z.string(),
    tags: z.array(z.string()),
    relatedArticles: z.array(
      z.object({
        title: z.string(),
        url: z.string(),
      })
    ).optional(),
  }),
});

const article = await stagehand.extract(
  "Extract the article content and metadata",
  newsSchema
);

Contact Information

const contactSchema = z.object({
  contact: z.object({
    email: z.string().email().optional(),
    phone: z.string().optional(),
    address: z.object({
      street: z.string(),
      city: z.string(),
      state: z.string(),
      zip: z.string(),
      country: z.string(),
    }).optional(),
    socialMedia: z.object({
      twitter: z.string().optional(),
      linkedin: z.string().optional(),
      facebook: z.string().optional(),
    }).optional(),
  }),
});

const contact = await stagehand.extract(
  "Extract all contact information",
  contactSchema
);

Best Practices

  1. Use descriptive schema fields:
    z.string().describe("The product's full name including brand")
    
  2. Make optional fields optional:
    z.object({
      required: z.string(),
      optional: z.string().optional(),
    })
    
  3. Use enums for known values:
    status: z.enum(["available", "unavailable", "coming_soon"])
    
  4. Validate extracted data:
    const data = await stagehand.extract(instruction, schema);
    const validated = schema.parse(data); // Throws if invalid
    
  5. Scope to relevant sections:
    // More accurate and faster
    extract(instruction, schema, { selector: ".product-details" })
    
  6. Use appropriate models:
    // Use faster models for simple extraction
    extract("Get title", schema, { model: "openai/gpt-4.1-mini" })
    

Build docs developers (and LLMs) love