The V3Evaluator class provides AI-powered evaluation of whether goals have been achieved. It can analyze screenshots, page text, or agent reasoning to determine if a task was completed successfully.
const evaluator = new V3Evaluator(stagehand);const result = await evaluator.ask({ question: "What is the page title?", answer: "Welcome to Example Site", screenshot: true,});if (result.evaluation === "YES") { console.log("Title matches expected value");} else { console.log("Title does not match:", result.reasoning);}
// Use a different model for evaluationconst evaluator = new V3Evaluator( stagehand, "anthropic/claude-3-5-sonnet-latest", { apiKey: process.env.ANTHROPIC_API_KEY, temperature: 0.3, });const result = await evaluator.ask({ question: "Is the checkout process complete?",});
const evaluator = new V3Evaluator(stagehand);const result = await evaluator.ask({ question: "Is the dashboard showing the correct data?", systemPrompt: `You are an expert QA engineer evaluating UI tests. Be strict about validation and look for any inconsistencies. Check that all expected elements are present and correct. Today's date is ${new Date().toLocaleDateString()}`, screenshot: true,});
// Goodask({ question: "Is the user logged in with username '[email protected]'?" })// Less specificask({ question: "Is login successful?" })
Provide expected answers for validation:
ask({ question: "What is the total price?", answer: "$149.99",})
Include screenshots for visual verification:
ask({ question: "Is the modal visible?", screenshot: true })
Use batch evaluation for efficiency:
// More efficient than multiple individual callsbatchAsk({ questions: [ { question: "Is element A visible?" }, { question: "Is element B visible?" }, { question: "Is element C visible?" }, ],})
Add delays for dynamic content:
ask({ question: "Did the animation complete?", screenshotDelayMs: 1000, // Wait for animation})