Overview
The observation workflow centers onsnapshot — it captures the accessibility tree and assigns refs (@e1, @e2, etc.) to interactive elements. Other commands help you search, query properties, and verify element states.
snapshot
Capture accessibility tree with @ref IDs
find
Search elements by role, name, value, or text
screenshot
Take PNG screenshot of application window
get
Read element property (text, value, bounds, role)
is
Check boolean state (visible, enabled, checked, focused)
list-surfaces
List available surfaces (menu, sheet, popover, alert)
Core Workflow
AI agents use this pattern:snapshot → decide → act → snapshot loop is optimal for LLMs: refs provide deterministic element selection without re-querying the tree.
Common Patterns
Interactive Elements Only
Use-i or --interactive-only to filter the tree to actionable elements:
Capture Open Menus
Menus are ephemeral surfaces. Capture them with--surface menu:
window, focused, menu, menubar, sheet, popover, alert.
Search Without Snapshot
If you know what you’re looking for, usefind instead of parsing a full tree:
Check Element State
Verify boolean properties before acting:{"ok": true, "data": {"result": true}} or false.
Read Property Values
Extract text, values, or bounds:Screenshot for Vision Models
Capture visual representation alongside the accessibility tree:snapshot for multimodal agent workflows.
Snapshot Options
| Flag | Default | Description |
|---|---|---|
--app <NAME> | focused app | Filter to a specific application |
--window-id <ID> | - | Filter to a specific window |
-i / --interactive-only | off | Only include interactive elements |
--compact | off | Omit empty structural nodes |
--include-bounds | off | Include pixel bounds (x, y, width, height) |
--max-depth <N> | 10 | Maximum tree depth |
--surface <TYPE> | window | window, focused, menu, menubar, sheet, popover, alert |
Examples
Use Cases
UI Testing
UI Testing
Verify element presence, states, and values during automated test runs:
Data Extraction
Data Extraction
Scrape structured data from desktop applications:
Multimodal AI Workflows
Multimodal AI Workflows
Combine accessibility tree with screenshots for vision-language models:
Element Discovery
Element Discovery
Explore unfamiliar apps to understand their structure:
Error Handling
Common error codes:APP_NOT_FOUND: Application not running or has no windowsELEMENT_NOT_FOUND: No element matched the ref or queryPERM_DENIED: Accessibility permission not grantedTIMEOUT: Wait condition expired (forwaitcommands)