Skip to main content
Observation commands let you read the accessibility tree, search for elements, check properties, and capture screenshots. These are read-only operations that don’t modify UI state.

Overview

The observation workflow centers on snapshot — it captures the accessibility tree and assigns refs (@e1, @e2, etc.) to interactive elements. Other commands help you search, query properties, and verify element states.

snapshot

Capture accessibility tree with @ref IDs

find

Search elements by role, name, value, or text

screenshot

Take PNG screenshot of application window

get

Read element property (text, value, bounds, role)

is

Check boolean state (visible, enabled, checked, focused)

list-surfaces

List available surfaces (menu, sheet, popover, alert)

Core Workflow

AI agents use this pattern:
# 1. Capture tree with refs
agent-desktop snapshot --app Finder -i

# 2. Decide on action based on tree structure
# 3. Act using refs from snapshot
agent-desktop click @e3

# 4. Re-observe after UI changes
agent-desktop snapshot -i
The snapshot → decide → act → snapshot loop is optimal for LLMs: refs provide deterministic element selection without re-querying the tree.

Common Patterns

Interactive Elements Only

Use -i or --interactive-only to filter the tree to actionable elements:
agent-desktop snapshot --app Safari -i
This omits static text, labels, and containers — reducing token usage for AI agents.

Capture Open Menus

Menus are ephemeral surfaces. Capture them with --surface menu:
agent-desktop snapshot --surface menu
Supported surface types: window, focused, menu, menubar, sheet, popover, alert.

Search Without Snapshot

If you know what you’re looking for, use find instead of parsing a full tree:
agent-desktop find --role button --app TextEdit
agent-desktop find --name "OK" --app "System Settings"
Returns matching elements with refs assigned on-the-fly.

Check Element State

Verify boolean properties before acting:
agent-desktop is @e7 checked
agent-desktop is @e3 enabled
agent-desktop is @e5 visible
Returns {"ok": true, "data": {"result": true}} or false.

Read Property Values

Extract text, values, or bounds:
agent-desktop get @e3 value
agent-desktop get @e5 text
agent-desktop get @e2 bounds
Useful for validation or scraping workflows.

Screenshot for Vision Models

Capture visual representation alongside the accessibility tree:
agent-desktop screenshot --app Finder
Returns base64-encoded PNG. Combine with snapshot for multimodal agent workflows.

Snapshot Options

FlagDefaultDescription
--app <NAME>focused appFilter to a specific application
--window-id <ID>-Filter to a specific window
-i / --interactive-onlyoffOnly include interactive elements
--compactoffOmit empty structural nodes
--include-boundsoffInclude pixel bounds (x, y, width, height)
--max-depth <N>10Maximum tree depth
--surface <TYPE>windowwindow, focused, menu, menubar, sheet, popover, alert

Examples

# Capture Safari tree with refs, interactive only
agent-desktop snapshot --app Safari -i

# Capture open context menu
agent-desktop snapshot --surface menu

# Find all buttons in TextEdit
agent-desktop find --role button --app TextEdit

# Check if checkbox is checked
agent-desktop is @e7 checked

# Get text field value
agent-desktop get @e3 value

# Screenshot Finder window
agent-desktop screenshot --app Finder

# List available surfaces in Notes
agent-desktop list-surfaces --app Notes

Use Cases

Verify element presence, states, and values during automated test runs:
agent-desktop find --role button --name "Submit"
agent-desktop is @e5 enabled
agent-desktop get @e3 value
Scrape structured data from desktop applications:
agent-desktop snapshot --app Notes -i
agent-desktop get @e2 text
agent-desktop get @e5 value
Combine accessibility tree with screenshots for vision-language models:
agent-desktop snapshot --app Safari -i > tree.json
agent-desktop screenshot --app Safari > screenshot.json
# Feed both to GPT-4 Vision or Claude
Explore unfamiliar apps to understand their structure:
agent-desktop snapshot --app Xcode --max-depth 5
agent-desktop find --role menuitem --app Xcode

Error Handling

Common error codes:
  • APP_NOT_FOUND: Application not running or has no windows
  • ELEMENT_NOT_FOUND: No element matched the ref or query
  • PERM_DENIED: Accessibility permission not granted
  • TIMEOUT: Wait condition expired (for wait commands)
All commands return structured JSON with error codes and recovery hints.

Build docs developers (and LLMs) love