Skip to main content
This guide will walk you through your first desktop automation workflow using agent-desktop.

Prerequisites

Before starting, ensure you have:

Your First Automation

Let’s automate a simple workflow in Finder: opening a folder and creating a new file.
1

Launch Finder

First, launch Finder if it’s not already running:
agent-desktop launch Finder
Response:
{
  "version": "1.0",
  "ok": true,
  "command": "launch",
  "data": {
    "app": "Finder",
    "pid": 1234,
    "window": {
      "id": "w-5678",
      "title": "Documents"
    }
  }
}
2

Capture the UI with snapshot

Get an accessibility tree of Finder with interactive elements:
agent-desktop snapshot --app Finder -i
The -i flag filters to interactive-only elements (buttons, text fields, etc.) and assigns refs like @e1, @e2, @e3.Example output:
{
  "version": "1.0",
  "ok": true,
  "command": "snapshot",
  "data": {
    "app": "Finder",
    "window": {
      "id": "w-5678",
      "title": "Documents"
    },
    "ref_count": 12,
    "tree": {
      "role": "window",
      "name": "Documents",
      "children": [
        {
          "role": "button",
          "name": "New Folder",
          "ref": "@e1"
        },
        {
          "role": "textfield",
          "name": "Search",
          "ref": "@e2"
        }
      ]
    }
  }
}
3

Click a button

Click the “New Folder” button using its ref:
agent-desktop click @e1
Response:
{
  "version": "1.0",
  "ok": true,
  "command": "click",
  "data": {
    "action": "click",
    "method": "ax_press"
  }
}
4

Type into a text field

After the UI updates, take a new snapshot to get fresh refs:
agent-desktop snapshot --app Finder -i
Find the new folder’s name field (e.g., @e5) and type a name:
agent-desktop type @e5 "Project Files"
Response:
{
  "version": "1.0",
  "ok": true,
  "command": "type",
  "data": {
    "action": "type",
    "text": "Project Files",
    "char_count": 13
  }
}
5

Press Enter

Confirm the folder name by pressing Return:
agent-desktop press return
Response:
{
  "version": "1.0",
  "ok": true,
  "command": "press",
  "data": {
    "action": "press",
    "combo": "return"
  }
}

Understanding the Workflow

The key pattern in agent-desktop is:
snapshot → decide → act → snapshot → decide → act → ...

Snapshot

Capture the current UI state with refs

Decide

Your AI agent analyzes the snapshot and picks an action

Act

Execute a command using refs (click, type, etc.)

Re-snapshot

After UI changes, capture fresh state and repeat

Why Refs?

Refs (@e1, @e2, etc.) provide deterministic element selection:
  • Fast: No need to re-query the accessibility tree for each action
  • Reliable: Refs are stable within a snapshot
  • AI-friendly: Simple syntax that LLMs can easily generate and use
Refs are valid only until the next snapshot. If the UI changes, you must run snapshot again to get fresh refs.

Handling Stale Refs

If a ref becomes stale (UI changed), agent-desktop returns a STALE_REF error:
{
  "version": "1.0",
  "ok": false,
  "command": "click",
  "error": {
    "code": "STALE_REF",
    "message": "Element at @e7 no longer matches the last snapshot",
    "suggestion": "Run 'snapshot' to refresh refs, then retry"
  }
}
Recovery: Simply run snapshot again and use the new refs.

Common Patterns

# Find a button by name
agent-desktop find --role button --name "Save" --app TextEdit

# Click it
agent-desktop click @e3
# Get form with refs
agent-desktop snapshot --app Safari -i

# Fill fields
agent-desktop type @e2 "[email protected]"
agent-desktop type @e3 "John Doe"
agent-desktop check @e5  # checkbox
agent-desktop click @e6  # submit button
# Wait for an element to appear
agent-desktop wait --element @e5 --timeout 5000

# Wait for text
agent-desktop wait --text "Loading complete" --app Safari

Next Steps

Core Concepts

Learn the snapshot-ref workflow in depth

Command Categories

Explore all 50+ commands by category

API Reference

Detailed documentation for every command

Error Handling

Handle errors and recover gracefully

Build docs developers (and LLMs) love