Skip to main content
Stagehand’s AI interprets your natural language instructions to identify elements and plan actions. The clearer your instructions, the more reliable your automation.

The Anatomy of a Good Instruction

Effective instructions follow a simple pattern:
1

Specify the Action

Start with what you want to do: click, type, select, scroll, etc.
2

Identify the Target

Describe the element using visible text, labels, or semantic meaning.
3

Add Context (Optional)

Provide additional details if multiple matches are possible.

Examples of Good Instructions

await stagehand.act("click the blue 'Submit' button");
await stagehand.act("type '[email protected]' into the email field");
await stagehand.act("select 'Canada' from the country dropdown");
await stagehand.act("scroll down to the footer");

How Instructions Are Processed

Under the hood, your instruction is transformed into a detailed prompt sent to the LLM:

System Prompt (Built-in)

prompt.ts:150-169
const actSystemPrompt = `
You are helping the user automate the browser by finding elements based on what action the user wants to take on the page

You will be given:
1. a user defined instruction about what action to take
2. a hierarchical accessibility tree showing the semantic structure of the page.

Return the element that matches the instruction if it exists. Otherwise, return an empty object.`;
This gives the AI context about its role and the data it will receive.

Action-Specific Guidance

Stagehand adds specific rules based on the action type:
prompt.ts:177-200
let instruction = `Find the most relevant element to perform an action on given the following action: ${action}.  
IF AND ONLY IF the action EXPLICITLY includes the word 'dropdown' and implies choosing/selecting an option from a dropdown, ignore the 'General Instructions' section, and follow the 'Dropdown Specific Instructions' section carefully.

General Instructions: 
  Provide an action for this element such as ${supportedActions.join(", ")}.
  When choosing non-left click actions, provide right or middle as the argument
  If the action is completely unrelated to a potential action to be taken on the page, return an empty object.
  ONLY return one action. If multiple actions are relevant, return the most relevant one.
`;
The AI is instructed to return only one action per call. If you need multiple actions, use observe() to find all matches, then iterate.

Custom System Prompts

You can add your own guidance:
const stagehand = new Stagehand({
  env: "LOCAL",
  systemPrompt: `
    When interacting with forms, always tab through fields in order.
    Prefer keyboard navigation over mouse clicks when possible.
    If a modal is blocking the page, close it before proceeding.
  `,
});
Your custom prompt is appended to the built-in system prompt:
prompt.ts:5-17
export function buildUserInstructionsString(
  userProvidedInstructions?: string,
): string {
  if (!userProvidedInstructions) {
    return "";
  }

  return `\n\n# Custom Instructions Provided by the User
    
Please keep the user's instructions in mind when performing actions. If the user's instructions are not relevant to the current task, ignore them.

User Instructions:
${userProvidedInstructions}`;
}

Action Types & Their Rules

Different actions have different handling logic:
Single-step clicking is the simplest:
await stagehand.act("click the 'Login' button");
The AI identifies elements that:
  • Are buttons or links
  • Have matching text or labels
  • Are visible and interactive
Remember: “to users, buttons and links look the same” - the AI treats them interchangeably.

Using Variables

Variables let you parameterize instructions without exposing sensitive data to the LLM:
await stagehand.act(
  "type %email% into the email field",
  { variables: { email: "[email protected]" } }
);

await stagehand.act(
  "fill the username with %username% and password with %password%",
  { 
    variables: { 
      username: "john_doe", 
      password: "secret123" 
    } 
  }
);
Variables are replaced after the LLM call, so sensitive values never appear in your prompts or logs.
The AI receives variable names in the prompt:
prompt.ts:203-210
if (variables && Object.keys(variables).length > 0) {
  const variableNames = Object.keys(variables)
    .map((key) => `%${key}%`)
    .join(", ");
  const variablesPrompt = `The following variables are available to use in the action: ${variableNames}. Fill the argument variables with the variable name.`;
  instruction += ` ${variablesPrompt}`;
}
During execution, Stagehand replaces %variableName% with actual values:
actHandler.ts:28
const resolvedValue = resolveVariableValue(argument, variables);

Extract-Specific Instructions

Extraction has different guidance since it’s about reading rather than acting:
prompt.ts:24-48
const baseContent = `You are extracting content on behalf of a user.
If a user asks you to extract a 'list' of information, or 'all' information, 
YOU MUST EXTRACT ALL OF THE INFORMATION THAT THE USER REQUESTS.

Print the exact text from the DOM elements with all symbols, characters, and endlines as is.
Print null or an empty string if no new information is found.

If a user is attempting to extract links or URLs, you MUST respond with ONLY the IDs of the link elements.
Do not attempt to extract links directly from the text unless absolutely necessary.`;
When extracting lists, be explicit: “extract all product names” ensures completeness.

Observe-Specific Instructions

Observe finds multiple elements rather than acting on one:
prompt.ts:120-128
const observeSystemPrompt = `
You are helping the user automate the browser by finding elements based on what the user wants to observe in the page.

Return an array of elements that match the instruction if they exist, otherwise return an empty array.
When returning elements, include the appropriate method from the supported actions list.
`;
const actions = await stagehand.observe("find all 'Add to Cart' buttons");
console.log(`Found ${actions.length} buttons`);

for (const action of actions) {
  await stagehand.act(action); // Act on each one
}

Best Practices

Mention color, position, or unique text when multiple similar elements exist:
// Good: Specific
await stagehand.act("click the red 'Delete' button in the top right");

// Bad: Ambiguous
await stagehand.act("click the delete button");
Refer to elements by what users see, not internal attributes:
// Good: User-visible
await stagehand.act("click 'Contact Us' in the navigation");

// Bad: Implementation details
await stagehand.act("click the element with id='nav-contact'");
Add context about blocking UI:
// Good: Acknowledges overlay
await stagehand.act("close the cookie consent banner");
await stagehand.act("click 'Accept All Cookies'");

// Then proceed with main action
await stagehand.act("click 'Sign In'");
Start broad, then refine:
// First attempt
await stagehand.act("click the submit button");
// If it fails, add detail
await stagehand.act("click the blue 'Submit Order' button at the bottom");
If you have recurring patterns, encode them once:
const stagehand = new Stagehand({
  env: "LOCAL",
  systemPrompt: `
    This is an e-commerce site. When adding items to cart:
    - Always wait for the 'Added!' confirmation before proceeding
    - If a size selection is required, choose 'Medium' by default
  `,
});

Common Pitfalls

Overly Complex InstructionsBreak multi-step flows into separate calls:
// Bad: Too much in one call
await stagehand.act("go to settings, click profile, change the email, and save");

// Good: Step by step
await stagehand.act("click 'Settings' in the menu");
await stagehand.act("click 'Profile' tab");
await stagehand.act("type '[email protected]' into the email field");
await stagehand.act("click 'Save Changes'");
Assuming Element StateDon’t assume elements are visible or enabled:
// Bad: Assumes the form is ready
await stagehand.act("submit the form");

// Good: Wait for readiness
await stagehand.act("fill the name field with 'John'");
await stagehand.act("fill the email field with '[email protected]'");
// Form becomes submittable after fields are filled
await stagehand.act("click 'Submit'");
Using Technical SelectorsAvoid CSS selectors or XPath in instructions:
// Bad: Technical selector
await stagehand.act("click the element with class 'btn-primary'");

// Good: Human description
await stagehand.act("click the blue 'Submit' button");
Let the AI translate semantic meaning to technical selectors.

Debugging Instructions

When an instruction doesn’t work:
1

Check the Logs

Set verbose: 2 to see what the AI identified:
const stagehand = new Stagehand({ env: "LOCAL", verbose: 2 });
2

Use Observe First

See what elements the AI finds:
const actions = await stagehand.observe("find the submit button");
console.log(actions);
3

Refine Your Description

Add more detail or try different wording:
// Original
await stagehand.act("submit");

// More specific
await stagehand.act("click the green 'Submit' button");
4

Check the DOM

Ensure the element exists and is visible. Use browser DevTools to inspect.

Next Steps

Understand Browser Contexts

Learn how Stagehand manages pages and frames

Leverage Caching

Speed up execution with cached actions

See Real Examples

Explore working instruction patterns

API Reference

Full method documentation

Build docs developers (and LLMs) love