Overview
Deterministic agents produce consistent, repeatable results. While LLMs are inherently probabilistic, Stagehand provides tools to make agents more predictable and reliable.
Use Caching for Determinism
Caching is the most powerful tool for deterministic behavior.
How Caching Works
const stagehand = new Stagehand({
env: "LOCAL",
cacheDir: "./stagehand-cache", // Enable caching
});
await stagehand.init();
// First execution: LLM determines actions
const result1 = await stagehand.act("click the login button");
// Second execution: Replays cached actions (deterministic)
const result2 = await stagehand.act("click the login button");
// result1 and result2 will use the same selector
Cache Keys
Caching is based on:
- Instruction text
- Page URL
- Variable keys (if using variables)
When cache hits:
- Actions replay with the same selectors
- 0 token usage
- Consistent behavior (deterministic)
When cache misses:
- New LLM inference
- May produce different selectors
- Non-deterministic until cached
For production workflows, pre-populate the cache in development and deploy with the cache directory.
Self-Healing Determinism
Stagehand’s cache includes self-healing:
const stagehand = new Stagehand({
env: "LOCAL",
cacheDir: "./cache",
selfHeal: true, // Default: enabled
});
How it works:
- Cache contains action:
click button[data-id='submit']
- DOM changes: button now has
data-id='submit-form'
- Stagehand detects failure, re-inferences, finds new selector
- Cache updates automatically with new selector
- Future runs use updated cache
Result: Determinism that adapts to changes.
Use Zod schemas for deterministic data extraction:
import { z } from "zod";
const productSchema = z.object({
title: z.string(),
price: z.number(),
inStock: z.boolean(),
rating: z.number().optional(),
});
const result = await stagehand.extract(
"extract product details",
{ schema: productSchema }
);
// result.extraction is guaranteed to match the schema
// or an error is thrown
Benefits:
- Type-safe outputs
- Validation ensures consistency
- Fails fast if structure doesn’t match
Agent Caching
Agents can cache entire multi-step workflows:
const agent = stagehand.agent();
// First execution: Agent explores and learns
const result1 = await agent.execute({
instruction: "search for 'laptop' and add first result to cart",
maxSteps: 10,
});
// Second execution: Replays exact sequence (deterministic)
const result2 = await agent.execute({
instruction: "search for 'laptop' and add first result to cart",
maxSteps: 10,
});
// Both executions take the same actions in the same order
Agent Cache Format
Agent cache stores:
- Each step’s type (act, extract, goto, scroll, etc.)
- Selectors and actions taken
- Variables used
- Final result
From AgentCache.ts:352-362:
const entry: CachedAgentEntry = {
version: 1,
instruction: context.instruction,
startUrl: context.startUrl,
options: context.options,
configSignature: context.configSignature,
steps: cloneForCache(steps),
result: this.pruneAgentResult(result),
timestamp: new Date().toISOString(),
};
Deterministic Actions
Stagehand’s ActHandler supports deterministic action replay:
Action Structure
const actions: Action[] = [
{
type: "click",
selector: "button[data-testid='submit']",
description: "Click the submit button",
},
{
type: "fill",
selector: "input[name='email']",
method: "fill",
arguments: ["[email protected]"],
description: "Fill email field",
},
];
// Replay these actions deterministically
for (const action of actions) {
await stagehand.act(action);
}
takeDeterministicAction
From ActCache.ts:196-226, Stagehand uses takeDeterministicAction to replay cached actions:
const result = await handler.takeDeterministicAction(
action,
page,
this.domSettleTimeoutMs,
effectiveClient,
undefined,
context.variables
);
This ensures actions replay exactly as cached.
System Prompts for Consistency
Use system prompts to enforce consistent behavior:
const stagehand = new Stagehand({
env: "LOCAL",
systemPrompt: `Rules:
- Always verify actions succeeded before proceeding
- Use data-testid attributes when available
- If multiple elements match, choose the first visible one
- Never click disabled buttons`,
});
const agent = stagehand.agent({
systemPrompt: `You are a shopping assistant.
- Always select the lowest-priced option
- Verify items are in stock before adding to cart
- Extract prices as numbers without currency symbols`,
});
Variables for Parameterization
Use variables to make workflows deterministic with dynamic inputs:
const searchProduct = async (productName: string) => {
const agent = stagehand.agent();
return await agent.execute({
instruction: "search for '{{product}}' and return the first result's price",
maxSteps: 10,
variables: { product: productName },
});
};
// Cache key includes variable names, not values
// So all product searches use the same cached workflow
await searchProduct("laptop"); // Caches workflow
await searchProduct("mouse"); // Reuses cache with different value
await searchProduct("keyboard"); // Reuses cache with different value
Key insight: Cache is keyed by variable names, not values, enabling deterministic workflows with dynamic data.
Limiting Non-Determinism
Set maxSteps
Prevent unbounded exploration:
const agent = stagehand.agent();
await agent.execute({
instruction: "find and click the submit button",
maxSteps: 3, // Limits exploration
});
Use Specific Instructions
// ❌ Non-deterministic: agent may explore different paths
await agent.execute({
instruction: "buy something",
maxSteps: 50,
});
// ✅ More deterministic: clear path
await agent.execute({
instruction: "click the 'Buy Now' button for the first product",
maxSteps: 5,
});
Timeouts and Error Handling
Deterministic error behavior:
try {
await stagehand.act("click submit button", {
timeout: 10_000, // Consistent timeout
});
} catch (error) {
if (error instanceof ActTimeoutError) {
console.log("Timed out after 10 seconds");
// Handle timeout consistently
}
throw error;
}
Timeout Error Types
From sdkErrors.ts:334-359:
export class TimeoutError extends StagehandError {
constructor(operation: string, timeoutMs: number) {
super(`${operation} timed out after ${timeoutMs}ms`);
}
}
export class ActTimeoutError extends TimeoutError {
constructor(timeoutMs: number) {
super("act()", timeoutMs);
this.name = "ActTimeoutError";
}
}
export class ExtractTimeoutError extends TimeoutError {
constructor(timeoutMs: number) {
super("extract()", timeoutMs);
this.name = "ExtractTimeoutError";
}
}
export class ObserveTimeoutError extends TimeoutError {
constructor(timeoutMs: number) {
super("observe()", timeoutMs);
this.name = "ObserveTimeoutError";
}
}
Testing Determinism
Replay Test
import { test, expect } from "@playwright/test";
test("workflow is deterministic", async () => {
const stagehand = new Stagehand({
env: "LOCAL",
cacheDir: "./test-cache",
});
await stagehand.init();
const page = stagehand.context.pages()[0];
await page.goto("https://example.com");
// Run workflow twice
const result1 = await stagehand.extract("get product title");
const result2 = await stagehand.extract("get product title");
// Results should be identical
expect(result1.extraction).toEqual(result2.extraction);
expect(result2.metadata?.cacheHit).toBe(true);
await stagehand.close();
});
Verify Cache Usage
const result = await stagehand.act("click button");
if (result.metadata?.cacheHit) {
console.log("✓ Using cached action (deterministic)");
console.log("Cache timestamp:", result.metadata.cacheTimestamp);
} else {
console.log("⚠ New LLM inference (non-deterministic)");
}
Pre-Warming Cache
For production determinism, pre-warm the cache:
// development.ts
const stagehand = new Stagehand({
env: "LOCAL",
cacheDir: "./production-cache",
});
await stagehand.init();
// Run all production workflows once
await runLoginWorkflow(stagehand);
await runSearchWorkflow(stagehand);
await runCheckoutWorkflow(stagehand);
await stagehand.close();
// Deploy ./production-cache to production
// All workflows will now be deterministic
Configuration Signature
Agent cache includes configuration signature to ensure consistency:
From AgentCache.ts:107-140:
buildConfigSignature(agentOptions?: AgentConfig): string {
const toolKeys = agentOptions?.tools
? Object.keys(agentOptions.tools).sort()
: undefined;
const integrationSignatures = agentOptions?.integrations
? agentOptions.integrations.map((integration) =>
typeof integration === "string" ? integration : "client",
)
: undefined;
const serializedModel = this.serializeAgentModelForCache(
agentOptions?.model,
);
const serializedExecutionModel = this.serializeAgentModelForCache(
agentOptions?.executionModel,
);
const isCuaMode =
agentOptions?.mode !== undefined
? agentOptions.mode === "cua"
: agentOptions?.cua === true;
return JSON.stringify({
v3Model: this.getBaseModelName(),
systemPrompt: this.getSystemPrompt() ?? "",
agent: {
cua: isCuaMode,
model: serializedModel ?? null,
executionModel: isCuaMode ? null : serializedExecutionModel,
systemPrompt: agentOptions?.systemPrompt ?? null,
toolKeys,
integrations: integrationSignatures,
},
});
}
Changing model, tools, or system prompts invalidates the cache, ensuring consistency.
Best Practices
Always enable caching
Set cacheDir for all workflows
Use structured schemas
Define Zod schemas for extractions
Write specific instructions
Reduce ambiguity and exploration
Limit maxSteps
Prevent unbounded agent exploration
Use variables for dynamic data
Cache workflows, parameterize values
Set consistent timeouts
Predictable error handling
Test with replays
Verify cache hits and consistency
Pre-warm production cache
Deploy with cached workflows
Related