Prerequisites
Before starting, ensure you have:- macOS 13.0 or later
- agent-desktop installed (see installation)
- Accessibility permissions granted (see permissions)
Your First Automation
Let’s automate a simple workflow in Finder: opening a folder and creating a new file.Capture the UI with snapshot
Get an accessibility tree of Finder with interactive elements:The
-i flag filters to interactive-only elements (buttons, text fields, etc.) and assigns refs like @e1, @e2, @e3.Example output:Type into a text field
After the UI updates, take a new snapshot to get fresh refs:Find the new folder’s name field (e.g., Response:
@e5) and type a name:Understanding the Workflow
The key pattern in agent-desktop is:Snapshot
Capture the current UI state with refs
Decide
Your AI agent analyzes the snapshot and picks an action
Act
Execute a command using refs (click, type, etc.)
Re-snapshot
After UI changes, capture fresh state and repeat
Why Refs?
Refs (@e1, @e2, etc.) provide deterministic element selection:
- Fast: No need to re-query the accessibility tree for each action
- Reliable: Refs are stable within a snapshot
- AI-friendly: Simple syntax that LLMs can easily generate and use
Handling Stale Refs
If a ref becomes stale (UI changed), agent-desktop returns aSTALE_REF error:
snapshot again and use the new refs.
Common Patterns
Search and click
Search and click
Fill a form
Fill a form
Navigate menus
Navigate menus
Wait for UI changes
Wait for UI changes
Next Steps
Core Concepts
Learn the snapshot-ref workflow in depth
Command Categories
Explore all 50+ commands by category
API Reference
Detailed documentation for every command
Error Handling
Handle errors and recover gracefully