Skip to main content
Agent-native is designed around a simple workflow: snapshot the UI to get refs, interact with elements using those refs, then re-snapshot when the UI changes. This pattern works for any macOS app and handles dynamic interfaces gracefully.

The core pattern

Here’s the fundamental cycle:
1

Snapshot

Capture the current UI state and assign refs to elements:
$ agent-native snapshot Safari --interactive
Snapshot: Safari (pid 1234) -- 8 elements
AXWindow "Safari" [ref=n1]
  AXTextField [ref=n2]
  AXButton "Go" [AXPress] [ref=n3]
2

Interact

Use refs to click, type, or inspect elements:
$ agent-native click @n2
OK Clicked: AXTextField

$ agent-native type @n2 "example.com"
OK Typed 11 chars into AXTextField

$ agent-native click @n3
OK Clicked: AXButton title="Go"
3

Re-snapshot

After the page loads (or any UI change), take a fresh snapshot:
$ agent-native snapshot Safari --interactive
Snapshot: Safari (pid 1234) -- 15 elements
# New UI state, new refs
This pattern mirrors how humans use computers: observe → act → observe again. The snapshot gives you a moment-in-time view of what’s possible.

When to re-snapshot

You need a fresh snapshot whenever the UI changes significantly:

After navigation

Browsers, settings panels, and multi-screen apps:
# Initial screen
$ agent-native snapshot "System Settings" --interactive
$ agent-native click @n5  # Click "Network"

# ⚠️ UI changed - new screen loaded
$ agent-native snapshot "System Settings" --interactive
$ agent-native click @n3  # Now this refers to a button on the Network pane

After dialogs open

Dialogs, sheets, and popovers add new elements:
$ agent-native snapshot Safari --interactive
$ agent-native key Cmd+O  # Open file dialog

# ⚠️ Dialog appeared
$ agent-native snapshot Safari --interactive
# Now refs include the file picker elements

After content loads

Dynamic content loading in browsers or document apps:
$ agent-native snapshot Safari --interactive
$ agent-native click @n3  # Click "Go"
$ agent-native wait 2      # Wait for page load

# ⚠️ Page content changed
$ agent-native snapshot Safari --interactive

When refs fail to resolve

If you get an error like this:
$ agent-native click @n8
Error: Could not re-resolve @n8. The UI may have changed -- run `snapshot` again.
Solution: Take a new snapshot. The element either moved, changed attributes, or was removed.
When in doubt, snapshot. Extra snapshots don’t hurt, but stale refs will cause errors.

Working with dynamic UIs

Dynamic interfaces require careful timing:

Wait for changes to settle

After triggering an action, wait before snapshotting:
$ agent-native click @n5
$ agent-native wait 1.5  # Wait for animation/load
$ agent-native snapshot MyApp --interactive
The wait command (from WaitCommand.swift) simply sleeps for the specified duration.

Check element state

Use get to read element values without re-snapshotting:
$ agent-native get @n2 --attr value
{"value": "example.com"}
Or use is to check boolean conditions:
$ agent-native is @n4 --focused
true

Use —interactive for cleaner snapshots

Large apps have hundreds of structural elements. Focus on what matters:
$ agent-native snapshot "Final Cut Pro" --interactive --compact
This filters to only interactive elements and removes empty containers. From SnapshotCommand.swift:66-74:
if interactive && !Self.interactiveRoles.contains(node.role) {
    continue  // Skip non-interactive elements
}
if compact && node.title == nil && node.label == nil
    && node.value == nil && node.actions.isEmpty
    && node.childCount > 0
{
    continue  // Skip empty structural containers
}

Element resolution strategies

You don’t always need refs. agent-native supports two resolution strategies:

Ref-based (snapshot first)

$ agent-native snapshot Safari --interactive
$ agent-native click @n3
Pros:
  • Fast interactions after initial snapshot
  • You see exactly what you’re clicking
  • Refs work across multiple commands
Cons:
  • Requires a snapshot step
  • Refs become stale when UI changes
$ agent-native click Safari --role button --title "Back"
From ElementResolver.swift:6-52, this searches the tree on-demand using filters:
static func resolve(
    app: String?,
    ref: String?,
    role: String?,
    title: String?,
    label: String?,
    identifier: String?,
    index: Int = 0
) throws -> (element: AXUIElement, node: AXNode, appName: String)
Pros:
  • No snapshot needed
  • Works with dynamic UIs where refs would be stale
  • Good for one-off commands
Cons:
  • Slower (searches the tree every time)
  • Less visibility into what exists
  • Ambiguous if multiple elements match
For agent workflows, prefer ref-based resolution. It’s faster and gives your agent visibility into the full UI context.

Best practices

1. Start with an interactive snapshot

$ agent-native snapshot MyApp --interactive
This gives you a clean, actionable view of the UI.

2. Use wait after UI-changing actions

$ agent-native click @n7
$ agent-native wait 1  # Let the UI settle
$ agent-native snapshot MyApp --interactive
Don’t snapshot too soon—animations and loads need time.

3. Re-snapshot liberally

Whenever you’re unsure if the UI has changed, just snapshot again:
$ agent-native snapshot MyApp --interactive
Snapshots are cheap and prevent ref resolution errors.

4. Use —compact for complex UIs

$ agent-native snapshot "Adobe Photoshop" --interactive --compact
This removes noise and focuses on actionable elements.

5. Check snapshots in JSON for parsing

$ agent-native snapshot Safari --interactive --json > snapshot.json
JSON output is structured and easy to parse for agents:
[
  {
    "ref": "n1",
    "role": "AXWindow",
    "title": "Safari",
    "enabled": true,
    "actions": ["AXRaise"],
    "depth": 0
  },
  {
    "ref": "n2",
    "role": "AXTextField",
    "value": "example.com",
    "enabled": true,
    "actions": ["AXConfirm"],
    "depth": 1
  }
]

6. Handle ref resolution failures gracefully

Always be ready to re-snapshot if a ref fails:
try:
    run(["agent-native", "click", "@n5"])
except RefNotFoundError:
    # Ref stale, refresh and retry
    run(["agent-native", "snapshot", "MyApp", "--interactive"])
    # Parse new snapshot to find the element again

Example: Multi-step workflow

Here’s a complete example of opening a file in TextEdit:
1

Open the app

$ agent-native open TextEdit
Opened: TextEdit (pid 1234)
2

Snapshot to see what's available

$ agent-native snapshot TextEdit --interactive
Snapshot: TextEdit (pid 1234) -- 6 elements
AXMenuBar [ref=n1]
  AXMenuBarItem "File" [ref=n2]
AXWindow "Untitled" [ref=n3]
  AXTextArea [ref=n4]
3

Click the File menu

$ agent-native click @n2
OK Clicked: AXMenuBarItem title="File"
4

Re-snapshot to see the menu

$ agent-native snapshot TextEdit --interactive
Snapshot: TextEdit (pid 1234) -- 12 elements
AXMenuBar [ref=n1]
  AXMenuBarItem "File" [ref=n2]
    AXMenu "File" [ref=n3]
      AXMenuItem "Open..." [ref=n4]
      AXMenuItem "Save" [ref=n5]
5

Click 'Open...'

$ agent-native click @n4
OK Clicked: AXMenuItem title="Open..."
6

Wait for the file dialog

$ agent-native wait 1
7

Snapshot the file picker

$ agent-native snapshot TextEdit --interactive
Snapshot: TextEdit (pid 1234) -- 23 elements
# Now includes file picker buttons, text fields, etc.
Notice the pattern: snapshot → interact → wait → snapshot → interact. This is how you navigate complex UIs reliably.

See also

Accessibility tree

Understanding the tree structure

Refs and snapshots

Deep dive into the ref system

Build docs developers (and LLMs) love