Overview
The extract() method extracts structured data from the current page using AI. It can return page text, answer questions, or extract data into custom schemas.
Syntax
// Get page text
const { pageText } = await stagehand . extract ();
// Answer a question
const { extraction } = await stagehand . extract ( "What is the article about?" );
// Extract with custom schema
const data = await stagehand . extract ( instruction , schema , options ? );
Overloads
await stagehand . extract ();
await stagehand . extract ( options ? );
returns
Promise<{ pageText: string }>
Object containing the full page text
await stagehand . extract ( instruction , options ? );
Question or description of what to extract
returns
Promise<{ extraction: string }>
Object containing the extracted string
await stagehand . extract ( instruction , schema , options ? );
Description of what to extract
schema
StagehandZodSchema
required
Zod schema defining the structure of extracted data Example: import { z } from "zod" ;
const schema = z . object ({
title: z . string (),
price: z . string (),
inStock: z . boolean (),
});
Extracted data matching the schema type
Options
Override the default model Format: "provider/model" or { modelName, ...clientOptions }
Maximum time to wait (milliseconds)
CSS selector to scope extraction to a specific element
page
Page | PlaywrightPage | PuppeteerPage | PatchrightPage
Page to extract from (defaults to active page)
Examples
import { Stagehand } from "@browserbasehq/stagehand" ;
const stagehand = new Stagehand ({ env: "LOCAL" });
await stagehand . init ();
const page = await stagehand . context . newPage ();
await page . goto ( "https://example.com" );
// Get all text content
const { pageText } = await stagehand . extract ();
console . log ( pageText );
await stagehand . close ();
Answer Questions
await page . goto ( "https://news.ycombinator.com" );
// Extract specific information
const { extraction } = await stagehand . extract (
"What is the title of the top story?"
);
console . log ( extraction ); // "New AI Framework Released"
import { z } from "zod" ;
await page . goto ( "https://example-shop.com/product/123" );
// Define schema
const productSchema = z . object ({
name: z . string (),
price: z . string (),
description: z . string (),
inStock: z . boolean (),
rating: z . number (). optional (),
});
// Extract data
const product = await stagehand . extract (
"Extract the product details" ,
productSchema
);
console . log ( product );
// {
// name: "Wireless Mouse",
// price: "$29.99",
// description: "Ergonomic wireless mouse with...",
// inStock: true,
// rating: 4.5
// }
import { z } from "zod" ;
await page . goto ( "https://example.com/articles" );
const articleListSchema = z . object ({
articles: z . array (
z . object ({
title: z . string (),
author: z . string (),
date: z . string (),
summary: z . string (). optional (),
})
),
});
const { articles } = await stagehand . extract (
"Extract all articles from the page" ,
articleListSchema
);
console . log ( `Found ${ articles . length } articles` );
// Extract only from specific section
const headerData = await stagehand . extract (
"Get the navigation links" ,
z . object ({
links: z . array ( z . object ({ text: z . string (), url: z . string () })),
}),
{ selector: "header nav" }
);
Complex Schema with Descriptions
import { z } from "zod" ;
const jobSchema = z . object ({
title: z . string (). describe ( "Job title" ),
company: z . string (). describe ( "Company name" ),
location: z . string (). describe ( "Job location" ),
salary: z
. string ()
. optional ()
. describe ( "Salary range if available" ),
remote: z . boolean (). describe ( "Whether the job is remote" ),
requirements: z
. array ( z . string ())
. describe ( "List of job requirements" ),
});
await page . goto ( "https://jobs.example.com/posting/123" );
const job = await stagehand . extract (
"Extract the job posting details" ,
jobSchema
);
// Use different model for extraction
const data = await stagehand . extract (
"Extract contact information" ,
contactSchema ,
{
model: "anthropic/claude-3-5-sonnet-latest" ,
}
);
const page1 = await stagehand . context . newPage ();
const page2 = await stagehand . context . newPage ();
await page1 . goto ( "https://example.com/page1" );
await page2 . goto ( "https://example.com/page2" );
// Extract from specific pages
const data1 = await stagehand . extract ( "Get the title" , schema , { page: page1 });
const data2 = await stagehand . extract ( "Get the title" , schema , { page: page2 });
Real-World Examples
E-commerce Product
const productSchema = z . object ({
product: z . object ({
name: z . string (),
brand: z . string (),
price: z . object ({
current: z . string (),
original: z . string (). optional (),
currency: z . string (),
}),
availability: z . enum ([ "in_stock" , "out_of_stock" , "pre_order" ]),
images: z . array ( z . string (). url ()),
specifications: z . record ( z . string (), z . string ()),
reviews: z . object ({
averageRating: z . number (),
totalReviews: z . number (),
}). optional (),
}),
});
const data = await stagehand . extract (
"Extract complete product information" ,
productSchema
);
News Articles
const newsSchema = z . object ({
article: z . object ({
headline: z . string (),
subheading: z . string (). optional (),
author: z . string (),
publishDate: z . string (),
content: z . string (),
tags: z . array ( z . string ()),
relatedArticles: z . array (
z . object ({
title: z . string (),
url: z . string (),
})
). optional (),
}),
});
const article = await stagehand . extract (
"Extract the article content and metadata" ,
newsSchema
);
Contact Information
const contactSchema = z . object ({
contact: z . object ({
email: z . string (). email (). optional (),
phone: z . string (). optional (),
address: z . object ({
street: z . string (),
city: z . string (),
state: z . string (),
zip: z . string (),
country: z . string (),
}). optional (),
socialMedia: z . object ({
twitter: z . string (). optional (),
linkedin: z . string (). optional (),
facebook: z . string (). optional (),
}). optional (),
}),
});
const contact = await stagehand . extract (
"Extract all contact information" ,
contactSchema
);
Best Practices
Use descriptive schema fields :
z . string (). describe ( "The product's full name including brand" )
Make optional fields optional :
z . object ({
required: z . string (),
optional: z . string (). optional (),
})
Use enums for known values :
status : z . enum ([ "available" , "unavailable" , "coming_soon" ])
Validate extracted data :
const data = await stagehand . extract ( instruction , schema );
const validated = schema . parse ( data ); // Throws if invalid
Scope to relevant sections :
// More accurate and faster
extract ( instruction , schema , { selector: ".product-details" })
Use appropriate models :
// Use faster models for simple extraction
extract ( "Get title" , schema , { model: "openai/gpt-4.1-mini" })