Quickstart

This guide will walk you through your first desktop automation workflow using agent-desktop.

Prerequisites

Before starting, ensure you have:

macOS 13.0 or later
agent-desktop installed (see installation)
Accessibility permissions granted (see permissions)

Your First Automation

Let’s automate a simple workflow in Finder: opening a folder and creating a new file.

Launch Finder

First, launch Finder if it’s not already running:

agent-desktop launch Finder

Response:

{
  "version": "1.0",
  "ok": true,
  "command": "launch",
  "data": {
    "app": "Finder",
    "pid": 1234,
    "window": {
      "id": "w-5678",
      "title": "Documents"
    }
  }
}

Capture the UI with snapshot

Get an accessibility tree of Finder with interactive elements:

agent-desktop snapshot --app Finder -i

The -i flag filters to interactive-only elements (buttons, text fields, etc.) and assigns refs like @e1, @e2, @e3.Example output:

{
  "version": "1.0",
  "ok": true,
  "command": "snapshot",
  "data": {
    "app": "Finder",
    "window": {
      "id": "w-5678",
      "title": "Documents"
    },
    "ref_count": 12,
    "tree": {
      "role": "window",
      "name": "Documents",
      "children": [
        {
          "role": "button",
          "name": "New Folder",
          "ref": "@e1"
        },
        {
          "role": "textfield",
          "name": "Search",
          "ref": "@e2"
        }
      ]
    }
  }
}

Click a button

Click the “New Folder” button using its ref:

agent-desktop click @e1

Response:

{
  "version": "1.0",
  "ok": true,
  "command": "click",
  "data": {
    "action": "click",
    "method": "ax_press"
  }
}

Type into a text field

After the UI updates, take a new snapshot to get fresh refs:

agent-desktop snapshot --app Finder -i

Find the new folder’s name field (e.g., @e5) and type a name:

agent-desktop type @e5 "Project Files"

Response:

{
  "version": "1.0",
  "ok": true,
  "command": "type",
  "data": {
    "action": "type",
    "text": "Project Files",
    "char_count": 13
  }
}

Press Enter

Confirm the folder name by pressing Return:

agent-desktop press return

Response:

{
  "version": "1.0",
  "ok": true,
  "command": "press",
  "data": {
    "action": "press",
    "combo": "return"
  }
}

Understanding the Workflow

The key pattern in agent-desktop is:

snapshot → decide → act → snapshot → decide → act → ...

Snapshot

Capture the current UI state with refs

Decide

Your AI agent analyzes the snapshot and picks an action

Act

Execute a command using refs (click, type, etc.)

Re-snapshot

After UI changes, capture fresh state and repeat

Why Refs?

Refs (@e1, @e2, etc.) provide deterministic element selection:

Fast: No need to re-query the accessibility tree for each action
Reliable: Refs are stable within a snapshot
AI-friendly: Simple syntax that LLMs can easily generate and use

Refs are valid only until the next snapshot. If the UI changes, you must run snapshot again to get fresh refs.

Handling Stale Refs

If a ref becomes stale (UI changed), agent-desktop returns a STALE_REF error:

{
  "version": "1.0",
  "ok": false,
  "command": "click",
  "error": {
    "code": "STALE_REF",
    "message": "Element at @e7 no longer matches the last snapshot",
    "suggestion": "Run 'snapshot' to refresh refs, then retry"
  }
}

Recovery: Simply run snapshot again and use the new refs.

Common Patterns

Search and click

# Find a button by name
agent-desktop find --role button --name "Save" --app TextEdit

# Click it
agent-desktop click @e3

Fill a form

# Get form with refs
agent-desktop snapshot --app Safari -i

# Fill fields
agent-desktop type @e2 "[email protected]"
agent-desktop type @e3 "John Doe"
agent-desktop check @e5  # checkbox
agent-desktop click @e6  # submit button

Navigate menus

# Press menu shortcut
agent-desktop press cmd+shift+n

# Or right-click for context menu
agent-desktop right-click @e4

# Click menu item
agent-desktop click @e7

Wait for UI changes

# Wait for an element to appear
agent-desktop wait --element @e5 --timeout 5000

# Wait for text
agent-desktop wait --text "Loading complete" --app Safari

Next Steps

Core Concepts

Learn the snapshot-ref workflow in depth

Command Categories

Explore all 50+ commands by category

API Reference

Detailed documentation for every command

Error Handling

Handle errors and recover gracefully

Get Started

Core Concepts

Command Categories

Guides

Advanced

Prerequisites

Your First Automation

Understanding the Workflow

Snapshot

Decide

Act

Re-snapshot

Why Refs?

Handling Stale Refs

Common Patterns

Next Steps

Core Concepts

Command Categories

API Reference

Error Handling

Build docs developers (and LLMs) love

Get Started

Core Concepts

Command Categories

Guides

Advanced

​Prerequisites

​Your First Automation

​Understanding the Workflow

Snapshot

Decide

Act

Re-snapshot

​Why Refs?

​Handling Stale Refs

​Common Patterns

​Next Steps

Core Concepts

Command Categories

API Reference

Error Handling

Build docs developers (and LLMs) love

Prerequisites

Your First Automation

Understanding the Workflow

Why Refs?

Handling Stale Refs

Common Patterns

Next Steps