Writing Effective AI Instructions

Stagehand’s AI interprets your natural language instructions to identify elements and plan actions. The clearer your instructions, the more reliable your automation.

The Anatomy of a Good Instruction

Effective instructions follow a simple pattern:

Specify the Action

Start with what you want to do: click, type, select, scroll, etc.

Identify the Target

Describe the element using visible text, labels, or semantic meaning.

Add Context (Optional)

Provide additional details if multiple matches are possible.

Examples of Good Instructions

await stagehand.act("click the blue 'Submit' button");
await stagehand.act("type '[email protected]' into the email field");
await stagehand.act("select 'Canada' from the country dropdown");
await stagehand.act("scroll down to the footer");

How Instructions Are Processed

Under the hood, your instruction is transformed into a detailed prompt sent to the LLM:

System Prompt (Built-in)

prompt.ts:150-169

const actSystemPrompt = `
You are helping the user automate the browser by finding elements based on what action the user wants to take on the page

You will be given:
1. a user defined instruction about what action to take
2. a hierarchical accessibility tree showing the semantic structure of the page.

Return the element that matches the instruction if it exists. Otherwise, return an empty object.`;

This gives the AI context about its role and the data it will receive.

Action-Specific Guidance

Stagehand adds specific rules based on the action type:

prompt.ts:177-200

let instruction = `Find the most relevant element to perform an action on given the following action: ${action}.  
IF AND ONLY IF the action EXPLICITLY includes the word 'dropdown' and implies choosing/selecting an option from a dropdown, ignore the 'General Instructions' section, and follow the 'Dropdown Specific Instructions' section carefully.

General Instructions: 
  Provide an action for this element such as ${supportedActions.join(", ")}.
  When choosing non-left click actions, provide right or middle as the argument
  If the action is completely unrelated to a potential action to be taken on the page, return an empty object.
  ONLY return one action. If multiple actions are relevant, return the most relevant one.
`;

The AI is instructed to return only one action per call. If you need multiple actions, use observe() to find all matches, then iterate.

Custom System Prompts

You can add your own guidance:

const stagehand = new Stagehand({
  env: "LOCAL",
  systemPrompt: `
    When interacting with forms, always tab through fields in order.
    Prefer keyboard navigation over mouse clicks when possible.
    If a modal is blocking the page, close it before proceeding.
  `,
});

Your custom prompt is appended to the built-in system prompt:

prompt.ts:5-17

export function buildUserInstructionsString(
  userProvidedInstructions?: string,
): string {
  if (!userProvidedInstructions) {
    return "";
  }

  return `\n\n# Custom Instructions Provided by the User
    
Please keep the user's instructions in mind when performing actions. If the user's instructions are not relevant to the current task, ignore them.

User Instructions:
${userProvidedInstructions}`;
}

Action Types & Their Rules

Different actions have different handling logic:

Click Actions
Dropdowns
Typing & Keys
Scrolling

Single-step clicking is the simplest:

await stagehand.act("click the 'Login' button");

The AI identifies elements that:

Are buttons or links
Have matching text or labels
Are visible and interactive

Remember: “to users, buttons and links look the same” - the AI treats them interchangeably.

Dropdowns require special handling based on their type:

prompt.ts:189-200

Dropdown Specific Instructions:
  For interacting with dropdowns, there are two specific cases that you need to handle. 
  
  CASE 1: the element is a 'select' element. 
    - choose the selectOptionFromDropdown method,
    - set the argument to the exact text of the option that should be selected,
    - set twoStep to false.
  CASE 2: the element is NOT a 'select' element:
    - do not attempt to directly choose the element from the dropdown. You will need to click to expand the dropdown first.
    - choose the 'click' method
    - set twoStep to true.

Example usage:

// Native <select> dropdown
await stagehand.act("select 'Blue' from the color dropdown");

// Custom dropdown (two-step)
await stagehand.act("choose 'Large' from the size dropdown");

When twoStep is set, Stagehand:

Clicks to expand the dropdown
Captures a new DOM snapshot
Finds and clicks the specific option

Text input and keyboard actions:

// Type text
await stagehand.act("type '[email protected]' into the email field");

// Press keys
await stagehand.act("press Enter");
await stagehand.act("press Tab");
await stagehand.act("press Escape");

Special keys like Enter, Tab, Escape should have their first letter capitalized. Regular letters stay lowercase.

The AI is guided:

prompt.ts:187

If the action implies a key press, e.g., 'press enter', 'press a', 'press space', etc., always choose the press method with the appropriate key as argument — e.g. 'a', 'Enter', 'Space'. Do not choose a click action on an on-screen keyboard. Capitalize the first character like 'Enter', 'Tab', 'Escape' only for special keys.

Scroll actions support multiple patterns:

// Scroll to position
await stagehand.act("scroll halfway down the page");
await stagehand.act("scroll to 75% of the page");

// Scroll by chunks
await stagehand.act("scroll to the next chunk");
await stagehand.act("scroll to the previous chunk");

// Scroll to element
await stagehand.act("scroll to the footer");

The AI is instructed:

prompt.ts:185-186

If the user is asking to scroll to a position on the page, e.g., 'halfway' or 0.75, etc, you must return the argument formatted as the correct percentage, e.g., '50%' or '75%', etc.
If the user is asking to scroll to the next chunk/previous chunk, choose the nextChunk/prevChunk method. No arguments are required here.

Using Variables

Variables let you parameterize instructions without exposing sensitive data to the LLM:

await stagehand.act(
  "type %email% into the email field",
  { variables: { email: "[email protected]" } }
);

await stagehand.act(
  "fill the username with %username% and password with %password%",
  { 
    variables: { 
      username: "john_doe", 
      password: "secret123" 
    } 
  }
);

Variables are replaced after the LLM call, so sensitive values never appear in your prompts or logs.

The AI receives variable names in the prompt:

prompt.ts:203-210

if (variables && Object.keys(variables).length > 0) {
  const variableNames = Object.keys(variables)
    .map((key) => `%${key}%`)
    .join(", ");
  const variablesPrompt = `The following variables are available to use in the action: ${variableNames}. Fill the argument variables with the variable name.`;
  instruction += ` ${variablesPrompt}`;
}

During execution, Stagehand replaces %variableName% with actual values:

actHandler.ts:28

const resolvedValue = resolveVariableValue(argument, variables);

Extract-Specific Instructions

Extraction has different guidance since it’s about reading rather than acting:

prompt.ts:24-48

const baseContent = `You are extracting content on behalf of a user.
If a user asks you to extract a 'list' of information, or 'all' information, 
YOU MUST EXTRACT ALL OF THE INFORMATION THAT THE USER REQUESTS.

Print the exact text from the DOM elements with all symbols, characters, and endlines as is.
Print null or an empty string if no new information is found.

If a user is attempting to extract links or URLs, you MUST respond with ONLY the IDs of the link elements.
Do not attempt to extract links directly from the text unless absolutely necessary.`;

When extracting lists, be explicit: “extract all product names” ensures completeness.

Observe-Specific Instructions

Observe finds multiple elements rather than acting on one:

prompt.ts:120-128

const observeSystemPrompt = `
You are helping the user automate the browser by finding elements based on what the user wants to observe in the page.

Return an array of elements that match the instruction if they exist, otherwise return an empty array.
When returning elements, include the appropriate method from the supported actions list.
`;

const actions = await stagehand.observe("find all 'Add to Cart' buttons");
console.log(`Found ${actions.length} buttons`);

for (const action of actions) {
  await stagehand.act(action); // Act on each one
}

Best Practices

Be Specific About Attributes

Mention color, position, or unique text when multiple similar elements exist:

// Good: Specific
await stagehand.act("click the red 'Delete' button in the top right");

// Bad: Ambiguous
await stagehand.act("click the delete button");

Use Visible Text

Refer to elements by what users see, not internal attributes:

// Good: User-visible
await stagehand.act("click 'Contact Us' in the navigation");

// Bad: Implementation details
await stagehand.act("click the element with id='nav-contact'");

Handle Modals & Overlays

Add context about blocking UI:

// Good: Acknowledges overlay
await stagehand.act("close the cookie consent banner");
await stagehand.act("click 'Accept All Cookies'");

// Then proceed with main action
await stagehand.act("click 'Sign In'");

Test Instructions Iteratively

Start broad, then refine:

// First attempt
await stagehand.act("click the submit button");
// If it fails, add detail
await stagehand.act("click the blue 'Submit Order' button at the bottom");

Use Custom System Prompts for Patterns

If you have recurring patterns, encode them once:

const stagehand = new Stagehand({
  env: "LOCAL",
  systemPrompt: `
    This is an e-commerce site. When adding items to cart:
    - Always wait for the 'Added!' confirmation before proceeding
    - If a size selection is required, choose 'Medium' by default
  `,
});

Common Pitfalls

Overly Complex InstructionsBreak multi-step flows into separate calls:

// Bad: Too much in one call
await stagehand.act("go to settings, click profile, change the email, and save");

// Good: Step by step
await stagehand.act("click 'Settings' in the menu");
await stagehand.act("click 'Profile' tab");
await stagehand.act("type '[email protected]' into the email field");
await stagehand.act("click 'Save Changes'");

Assuming Element StateDon’t assume elements are visible or enabled:

// Bad: Assumes the form is ready
await stagehand.act("submit the form");

// Good: Wait for readiness
await stagehand.act("fill the name field with 'John'");
await stagehand.act("fill the email field with '[email protected]'");
// Form becomes submittable after fields are filled
await stagehand.act("click 'Submit'");

Using Technical SelectorsAvoid CSS selectors or XPath in instructions:

// Bad: Technical selector
await stagehand.act("click the element with class 'btn-primary'");

// Good: Human description
await stagehand.act("click the blue 'Submit' button");

Let the AI translate semantic meaning to technical selectors.

Debugging Instructions

When an instruction doesn’t work:

Check the Logs

Set verbose: 2 to see what the AI identified:

const stagehand = new Stagehand({ env: "LOCAL", verbose: 2 });

Use Observe First

See what elements the AI finds:

const actions = await stagehand.observe("find the submit button");
console.log(actions);

Refine Your Description

Add more detail or try different wording:

// Original
await stagehand.act("submit");

// More specific
await stagehand.act("click the green 'Submit' button");

Check the DOM

Ensure the element exists and is visible. Use browser DevTools to inspect.

Next Steps

Understand Browser Contexts

Learn how Stagehand manages pages and frames

Leverage Caching

Speed up execution with cached actions

See Real Examples

Explore working instruction patterns

API Reference

Full method documentation

Getting Started

Core Concepts

Core Methods

Configuration

Integrations

Best Practices

Advanced Features

Writing Effective AI Instructions

The Anatomy of a Good Instruction

Examples of Good Instructions

How Instructions Are Processed

System Prompt (Built-in)

Action-Specific Guidance

Custom System Prompts

Action Types & Their Rules

Using Variables

Extract-Specific Instructions

Observe-Specific Instructions

Best Practices

Common Pitfalls

Debugging Instructions

Next Steps

Understand Browser Contexts

Leverage Caching

See Real Examples

API Reference

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Core Methods

Configuration

Integrations

Best Practices

Advanced Features

​The Anatomy of a Good Instruction

​Examples of Good Instructions

​How Instructions Are Processed

​System Prompt (Built-in)

​Action-Specific Guidance

​Custom System Prompts

​Action Types & Their Rules

​Using Variables

​Extract-Specific Instructions

​Observe-Specific Instructions

​Best Practices

​Common Pitfalls

​Debugging Instructions

​Next Steps

Understand Browser Contexts

Leverage Caching

See Real Examples

API Reference

Build docs developers (and LLMs) love

The Anatomy of a Good Instruction

Examples of Good Instructions

How Instructions Are Processed

System Prompt (Built-in)

Action-Specific Guidance

Custom System Prompts

Action Types & Their Rules

Using Variables

Extract-Specific Instructions

Observe-Specific Instructions

Best Practices

Common Pitfalls

Debugging Instructions

Next Steps