Skip to main content
Interaction commands modify UI state by clicking buttons, typing text, toggling checkboxes, scrolling, and more. All interactions use AX-first strategies — exhausting accessibility APIs before falling back to mouse events.

Overview

Every interaction command targets an element by ref (e.g., @e3) obtained from a prior snapshot. This ensures deterministic, accessibility-native control.

click

Click element via accessibility press action

double-click

Double-click to open files or select words

triple-click

Triple-click to select lines or paragraphs

right-click

Right-click to open context menu

type

Focus element and type text

set-value

Set value directly via accessibility attribute

clear

Clear element value to empty string

focus

Set keyboard focus on element

select

Select option in dropdown or list

toggle

Toggle checkbox or switch

check

Set checkbox to checked (idempotent)

uncheck

Set checkbox to unchecked (idempotent)

expand

Expand disclosure or tree item

collapse

Collapse disclosure or tree item

scroll

Scroll element in a direction

scroll-to

Scroll element into visible area

AX-First Philosophy

Every action follows a 15-step accessibility activation chain before resorting to mouse events:
  1. Try AXPress action (native click)
  2. Try AXShowMenu action (for menu buttons)
  3. Try setting AXFocused + synthetic key press
  4. Try AXPerformAction with target element
  5. … (10 more strategies)
  6. Final fallback: mouse click at element center
This ensures compatibility with screen readers, keyboard navigation, and accessibility tools.

Common Patterns

Click a Button

agent-desktop click @e3
Uses kAXPressAction if available. Returns success or falls back to mouse.

Type Into Text Field

agent-desktop type @e5 "hello world"
Focuses the element, then sends key events. Equivalent to a user typing.

Set Value Directly

For faster input without simulating keystrokes:
agent-desktop set-value @e5 "new value"
Sets kAXValueAttribute directly. Useful for bulk text replacement.

Clear Input Field

agent-desktop clear @e5
Sets value to empty string. Idempotent.

Toggle Checkbox

agent-desktop toggle @e12
Flips checked state. Use check or uncheck for idempotent operations:
agent-desktop check @e12    # always checked after
agent-desktop uncheck @e12  # always unchecked after

Select Dropdown Option

agent-desktop select @e9 "Option B"
Finds the option by name and selects it via accessibility APIs.

Expand/Collapse Disclosure

agent-desktop expand @e15
agent-desktop collapse @e15
Works with disclosure triangles, tree items, accordions.

Scroll Element

agent-desktop scroll @e1 down 3
agent-desktop scroll @e1 up 1
Uses 10-step AX-first scroll chain before mouse wheel events.

Scroll Element Into View

agent-desktop scroll-to @e20
Ensures element is visible. Useful before clicking off-screen items.

Right-Click for Context Menu

agent-desktop right-click @e3
Returns the context menu tree inline:
{
  "ok": true,
  "data": {
    "menu": {
      "items": [
        {"ref": "@e1", "name": "Copy"},
        {"ref": "@e2", "name": "Paste"}
      ]
    }
  }
}
You can then click @e1 to select a menu item.

Examples

# Click a button
agent-desktop click @e3

# Double-click to open a file
agent-desktop double-click @e3

# Triple-click to select a line
agent-desktop triple-click @e3

# Right-click to open context menu
agent-desktop right-click @e3

# Type into a text field
agent-desktop type @e5 "hello world"

# Set value directly
agent-desktop set-value @e5 "new value"

# Clear text field
agent-desktop clear @e5

# Focus element
agent-desktop focus @e5

# Select dropdown option
agent-desktop select @e9 "Option B"

# Toggle checkbox
agent-desktop toggle @e12

# Check checkbox (idempotent)
agent-desktop check @e12

# Uncheck checkbox (idempotent)
agent-desktop uncheck @e12

# Expand disclosure
agent-desktop expand @e15

# Collapse disclosure
agent-desktop collapse @e15

# Scroll down 3 units
agent-desktop scroll @e1 down 3

# Scroll element into view
agent-desktop scroll-to @e20

Use Cases

Fill out forms using refs from snapshot:
agent-desktop type @e2 "[email protected]"
agent-desktop type @e3 "password123"
agent-desktop check @e5
agent-desktop click @e7
Navigate Finder, select files, trigger actions:
agent-desktop click @e3              # select file
agent-desktop right-click @e3        # open context menu
agent-desktop click @e1              # click "Copy"
Toggle preferences in System Settings:
agent-desktop check @e5              # enable option
agent-desktop select @e9 "Dark Mode" # set theme
agent-desktop click @e12             # apply
Manipulate documents in TextEdit or Word:
agent-desktop triple-click @e3       # select paragraph
agent-desktop type @e5 "replacement text"
agent-desktop press cmd+s            # save

Error Handling

Common error codes:
  • STALE_REF: Element no longer matches the last snapshot. Run snapshot to refresh refs.
  • ACTION_FAILED: The OS rejected the action (element not actionable, wrong role, etc.)
  • ELEMENT_NOT_FOUND: Ref doesn’t exist or element was removed
  • ACTION_NOT_SUPPORTED: Element doesn’t support the requested action
Recovery pattern:
agent-desktop click @e3
# → STALE_REF error
agent-desktop snapshot -i
# → get updated refs
agent-desktop click @e5  # retry with new ref

Build docs developers (and LLMs) love