Learn how to write clear instructions that guide Stagehand’s AI to perform precise actions
Stagehand’s AI interprets your natural language instructions to identify elements and plan actions. The clearer your instructions, the more reliable your automation.
await stagehand.act("click the blue 'Submit' button");await stagehand.act("type '[email protected]' into the email field");await stagehand.act("select 'Canada' from the country dropdown");await stagehand.act("scroll down to the footer");
const actSystemPrompt = `You are helping the user automate the browser by finding elements based on what action the user wants to take on the pageYou will be given:1. a user defined instruction about what action to take2. a hierarchical accessibility tree showing the semantic structure of the page.Return the element that matches the instruction if it exists. Otherwise, return an empty object.`;
This gives the AI context about its role and the data it will receive.
Stagehand adds specific rules based on the action type:
prompt.ts:177-200
let instruction = `Find the most relevant element to perform an action on given the following action: ${action}. IF AND ONLY IF the action EXPLICITLY includes the word 'dropdown' and implies choosing/selecting an option from a dropdown, ignore the 'General Instructions' section, and follow the 'Dropdown Specific Instructions' section carefully.General Instructions: Provide an action for this element such as ${supportedActions.join(", ")}. When choosing non-left click actions, provide right or middle as the argument If the action is completely unrelated to a potential action to be taken on the page, return an empty object. ONLY return one action. If multiple actions are relevant, return the most relevant one.`;
The AI is instructed to return only one action per call. If you need multiple actions, use observe() to find all matches, then iterate.
const stagehand = new Stagehand({ env: "LOCAL", systemPrompt: ` When interacting with forms, always tab through fields in order. Prefer keyboard navigation over mouse clicks when possible. If a modal is blocking the page, close it before proceeding. `,});
Your custom prompt is appended to the built-in system prompt:
prompt.ts:5-17
export function buildUserInstructionsString( userProvidedInstructions?: string,): string { if (!userProvidedInstructions) { return ""; } return `\n\n# Custom Instructions Provided by the UserPlease keep the user's instructions in mind when performing actions. If the user's instructions are not relevant to the current task, ignore them.User Instructions:${userProvidedInstructions}`;}
Remember: “to users, buttons and links look the same” - the AI treats them interchangeably.
Dropdowns require special handling based on their type:
prompt.ts:189-200
Dropdown Specific Instructions: For interacting with dropdowns, there are two specific cases that you need to handle. CASE 1: the element is a 'select' element. - choose the selectOptionFromDropdown method, - set the argument to the exact text of the option that should be selected, - set twoStep to false. CASE 2: the element is NOT a 'select' element: - do not attempt to directly choose the element from the dropdown. You will need to click to expand the dropdown first. - choose the 'click' method - set twoStep to true.
Example usage:
// Native <select> dropdownawait stagehand.act("select 'Blue' from the color dropdown");// Custom dropdown (two-step)await stagehand.act("choose 'Large' from the size dropdown");
When twoStep is set, Stagehand:
Clicks to expand the dropdown
Captures a new DOM snapshot
Finds and clicks the specific option
Text input and keyboard actions:
// Type textawait stagehand.act("type '[email protected]' into the email field");// Press keysawait stagehand.act("press Enter");await stagehand.act("press Tab");await stagehand.act("press Escape");
Special keys like Enter, Tab, Escape should have their first letter capitalized. Regular letters stay lowercase.
The AI is guided:
prompt.ts:187
If the action implies a key press, e.g., 'press enter', 'press a', 'press space', etc., always choose the press method with the appropriate key as argument — e.g. 'a', 'Enter', 'Space'. Do not choose a click action on an on-screen keyboard. Capitalize the first character like 'Enter', 'Tab', 'Escape' only for special keys.
Scroll actions support multiple patterns:
// Scroll to positionawait stagehand.act("scroll halfway down the page");await stagehand.act("scroll to 75% of the page");// Scroll by chunksawait stagehand.act("scroll to the next chunk");await stagehand.act("scroll to the previous chunk");// Scroll to elementawait stagehand.act("scroll to the footer");
The AI is instructed:
prompt.ts:185-186
If the user is asking to scroll to a position on the page, e.g., 'halfway' or 0.75, etc, you must return the argument formatted as the correct percentage, e.g., '50%' or '75%', etc.If the user is asking to scroll to the next chunk/previous chunk, choose the nextChunk/prevChunk method. No arguments are required here.
Variables let you parameterize instructions without exposing sensitive data to the LLM:
await stagehand.act( "type %email% into the email field", { variables: { email: "[email protected]" } });await stagehand.act( "fill the username with %username% and password with %password%", { variables: { username: "john_doe", password: "secret123" } });
Variables are replaced after the LLM call, so sensitive values never appear in your prompts or logs.
The AI receives variable names in the prompt:
prompt.ts:203-210
if (variables && Object.keys(variables).length > 0) { const variableNames = Object.keys(variables) .map((key) => `%${key}%`) .join(", "); const variablesPrompt = `The following variables are available to use in the action: ${variableNames}. Fill the argument variables with the variable name.`; instruction += ` ${variablesPrompt}`;}
During execution, Stagehand replaces %variableName% with actual values:
Extraction has different guidance since it’s about reading rather than acting:
prompt.ts:24-48
const baseContent = `You are extracting content on behalf of a user.If a user asks you to extract a 'list' of information, or 'all' information, YOU MUST EXTRACT ALL OF THE INFORMATION THAT THE USER REQUESTS.Print the exact text from the DOM elements with all symbols, characters, and endlines as is.Print null or an empty string if no new information is found.If a user is attempting to extract links or URLs, you MUST respond with ONLY the IDs of the link elements.Do not attempt to extract links directly from the text unless absolutely necessary.`;
When extracting lists, be explicit: “extract all product names” ensures completeness.
Observe finds multiple elements rather than acting on one:
prompt.ts:120-128
const observeSystemPrompt = `You are helping the user automate the browser by finding elements based on what the user wants to observe in the page.Return an array of elements that match the instruction if they exist, otherwise return an empty array.When returning elements, include the appropriate method from the supported actions list.`;
const actions = await stagehand.observe("find all 'Add to Cart' buttons");console.log(`Found ${actions.length} buttons`);for (const action of actions) { await stagehand.act(action); // Act on each one}
Mention color, position, or unique text when multiple similar elements exist:
// Good: Specificawait stagehand.act("click the red 'Delete' button in the top right");// Bad: Ambiguousawait stagehand.act("click the delete button");
Use Visible Text
Refer to elements by what users see, not internal attributes:
// Good: User-visibleawait stagehand.act("click 'Contact Us' in the navigation");// Bad: Implementation detailsawait stagehand.act("click the element with id='nav-contact'");
Handle Modals & Overlays
Add context about blocking UI:
// Good: Acknowledges overlayawait stagehand.act("close the cookie consent banner");await stagehand.act("click 'Accept All Cookies'");// Then proceed with main actionawait stagehand.act("click 'Sign In'");
Test Instructions Iteratively
Start broad, then refine:
// First attemptawait stagehand.act("click the submit button");// If it fails, add detailawait stagehand.act("click the blue 'Submit Order' button at the bottom");
Use Custom System Prompts for Patterns
If you have recurring patterns, encode them once:
const stagehand = new Stagehand({ env: "LOCAL", systemPrompt: ` This is an e-commerce site. When adding items to cart: - Always wait for the 'Added!' confirmation before proceeding - If a size selection is required, choose 'Medium' by default `,});
Overly Complex InstructionsBreak multi-step flows into separate calls:
// Bad: Too much in one callawait stagehand.act("go to settings, click profile, change the email, and save");// Good: Step by stepawait stagehand.act("click 'Settings' in the menu");await stagehand.act("click 'Profile' tab");await stagehand.act("type '[email protected]' into the email field");await stagehand.act("click 'Save Changes'");
Assuming Element StateDon’t assume elements are visible or enabled:
// Bad: Assumes the form is readyawait stagehand.act("submit the form");// Good: Wait for readinessawait stagehand.act("fill the name field with 'John'");await stagehand.act("fill the email field with '[email protected]'");// Form becomes submittable after fields are filledawait stagehand.act("click 'Submit'");
Using Technical SelectorsAvoid CSS selectors or XPath in instructions:
// Bad: Technical selectorawait stagehand.act("click the element with class 'btn-primary'");// Good: Human descriptionawait stagehand.act("click the blue 'Submit' button");
Let the AI translate semantic meaning to technical selectors.