Overview
The extract() method allows you to extract structured data from web pages using natural language instructions and Zod schemas. It leverages AI to understand page content and return data in the exact format you need.
Method Signature
extract < T extends StagehandZodSchema > (
instruction ?: string ,
schema ?: T ,
options ?: ExtractOptions
): Promise < InferStagehandSchema < T >>
Parameters
Natural language description of what data to extract (e.g., “Extract all product listings with their prices”). Optional when no schema is provided - returns page text.
Zod schema defining the structure of data to extract. Supports z.object(), z.array(), and nested schemas. import { z } from "zod" ;
const schema = z . object ({
title: z . string (). describe ( "Page title" ),
price: z . string (). describe ( "Product price" ),
});
Optional configuration for extraction. Override the default model for this specific extraction.
Maximum time in milliseconds to wait for extraction. Throws ExtractTimeoutError if exceeded.
Focus extraction on a specific part of the page. Accepts CSS selectors or XPath (prefix with xpath=).
Specific page to extract from (useful for multi-page scenarios).
Return Value
Returns a Promise that resolves to data matching your Zod schema structure.
With schema: Returns typed data matching the schema
Without schema: Returns { extraction: string } or { pageText: string }
Usage Examples
import { Stagehand } from "@stagehand/api" ;
import { z } from "zod" ;
const stagehand = new Stagehand ({
env: "BROWSERBASE" ,
apiKey: process . env . BROWSERBASE_API_KEY ,
});
await stagehand . init ();
const page = stagehand . context . pages ()[ 0 ];
await page . goto ( "https://news.ycombinator.com" );
const articles = await stagehand . extract (
"Extract the top 5 article titles" ,
z . object ({
titles: z . array ( z . string ()),
})
);
console . log ( articles . titles );
await page . goto ( "https://www.apartments.com/san-francisco-ca/" );
const listings = await stagehand . extract (
"Extract all apartment listings with prices and addresses" ,
z . object ({
listings: z . array (
z . object ({
price: z . string (). describe ( "The price of the listing" ),
address: z . string (). describe ( "The address of the listing" ),
})
),
})
);
console . log ( `Found ${ listings . listings . length } apartments` );
listings . listings . forEach (( listing ) => {
console . log ( ` ${ listing . address } : ${ listing . price } ` );
});
Nested Data Structures
const productData = await stagehand . extract (
"Extract product information" ,
z . object ({
product: z . object ({
name: z . string (),
price: z . string (),
features: z . array ( z . string ()),
reviews: z . object ({
rating: z . number (),
count: z . number (),
topReview: z . string (),
}),
}),
})
);
console . log ( productData . product . name );
console . log ( `Rating: ${ productData . product . reviews . rating } /5` );
// Zod's .url() fields are automatically converted to clickable URLs
const links = await stagehand . extract (
"Get all navigation links" ,
z . object ({
links: z . array (
z . object ({
text: z . string (),
url: z . string (). url (), // Automatically extracts href attribute
})
),
})
);
for ( const link of links . links ) {
console . log ( ` ${ link . text } : ${ link . url } ` );
}
// Extract from a specific section of the page
const sidebarData = await stagehand . extract (
"Extract trending topics" ,
z . object ({
topics: z . array ( z . string ()),
}),
{
selector: "aside.sidebar" , // CSS selector
}
);
// Or use XPath
const contentData = await stagehand . extract (
"Extract main content" ,
schema ,
{
selector: "xpath=//main[@id='content']" ,
}
);
// Without instruction or schema - returns page text
const { pageText } = await stagehand . extract ();
console . log ( pageText );
// With instruction only - returns free-form extraction
const { extraction } = await stagehand . extract (
"What is the main topic of this page?"
);
console . log ( extraction );
const page1 = stagehand . context . pages ()[ 0 ];
const page2 = await stagehand . context . newPage ();
await page1 . goto ( "https://example.com/page1" );
await page2 . goto ( "https://example.com/page2" );
const data1 = await stagehand . extract (
"Extract title" ,
z . object ({ title: z . string () }),
{ page: page1 }
);
const data2 = await stagehand . extract (
"Extract title" ,
z . object ({ title: z . string () }),
{ page: page2 }
);
Using Descriptions
// Add .describe() to help the AI understand what to extract
const userData = await stagehand . extract (
"Extract user profile information" ,
z . object ({
username: z . string (). describe ( "The user's display name" ),
email: z . string (). describe ( "The user's email address" ),
joinDate: z . string (). describe ( "Date the user joined, in MM/DD/YYYY format" ),
isVerified: z . boolean (). describe ( "Whether the user's account is verified" ),
})
);
Handling Missing Data
// Use .optional() for fields that might not exist
const result = await stagehand . extract (
"Extract article metadata" ,
z . object ({
title: z . string (),
author: z . string (). optional (),
publishDate: z . string (). optional (),
readTime: z . string (). optional (),
})
);
if ( result . author ) {
console . log ( `By ${ result . author } ` );
}
With Timeout
try {
const data = await stagehand . extract (
"Extract complex data" ,
schema ,
{ timeout: 30000 } // 30 seconds
);
} catch ( error ) {
if ( error instanceof ExtractTimeoutError ) {
console . error ( "Extraction timed out" );
}
}
Supported Schema Types
Stagehand’s extract() supports most Zod schema types:
Primitives : z.string(), z.number(), z.boolean()
Objects : z.object({ ... })
Arrays : z.array(...)
Optionals : .optional()
Nested structures : Objects within objects, arrays of objects
URLs : z.string().url() - automatically extracts href attributes
Descriptions : .describe("...") - helps guide extraction
How It Works
Snapshot : Captures an accessibility tree of the page
LLM Processing : Sends the instruction and schema to the AI model
Extraction : AI identifies and extracts matching data
Validation : Data is validated against your Zod schema
Return : Typed data matching your schema structure
Performance Tips
Use focused selectors - Extract from specific page sections
await stagehand . extract ( instruction , schema , {
selector: ".product-details"
});
Be specific with descriptions - Help the AI understand context
z . string (). describe ( "The product price in USD format" )
Use appropriate schemas - Don’t over-complicate structure
// Good - simple and clear
z . object ({ price: z . string () })
// Overkill - unnecessary complexity
z . object ({
price: z . object ({
amount: z . string (),
currency: z . string ()
})
})
Error Handling
try {
const data = await stagehand . extract ( instruction , schema );
console . log ( data );
} catch ( error ) {
if ( error instanceof ExtractTimeoutError ) {
console . error ( "Extraction timed out" );
} else if ( error instanceof StagehandInvalidArgumentError ) {
console . error ( "Invalid schema or instruction" );
} else {
console . error ( "Extraction failed:" , error );
}
}
Best Practices
Clear instructions - Be explicit about what to extract
Use descriptions - Add .describe() to schema fields
Handle optionals - Use .optional() for fields that may not exist
Focus extraction - Use selector option for large pages
Type safety - Let TypeScript infer types from your schema
// TypeScript automatically knows the structure
const result = await stagehand . extract (
"Extract data" ,
z . object ({
title: z . string (),
count: z . number (),
})
);
// result.title is string
// result.count is number
Related Methods
act() - Perform actions on the page
observe() - Preview actions before executing
agent() - Autonomous multi-step automation