Workflow

Agent-native is designed around a simple workflow: snapshot the UI to get refs, interact with elements using those refs, then re-snapshot when the UI changes. This pattern works for any macOS app and handles dynamic interfaces gracefully.

The core pattern

Here’s the fundamental cycle:

Snapshot

Capture the current UI state and assign refs to elements:

$ agent-native snapshot Safari --interactive
Snapshot: Safari (pid 1234) -- 8 elements
AXWindow "Safari" [ref=n1]
  AXTextField [ref=n2]
  AXButton "Go" [AXPress] [ref=n3]

Interact

Use refs to click, type, or inspect elements:

$ agent-native click @n2
OK Clicked: AXTextField

$ agent-native type @n2 "example.com"
OK Typed 11 chars into AXTextField

$ agent-native click @n3
OK Clicked: AXButton title="Go"

Re-snapshot

After the page loads (or any UI change), take a fresh snapshot:

$ agent-native snapshot Safari --interactive
Snapshot: Safari (pid 1234) -- 15 elements
# New UI state, new refs

This pattern mirrors how humans use computers: observe → act → observe again. The snapshot gives you a moment-in-time view of what’s possible.

When to re-snapshot

You need a fresh snapshot whenever the UI changes significantly: Browsers, settings panels, and multi-screen apps:

# Initial screen
$ agent-native snapshot "System Settings" --interactive
$ agent-native click @n5  # Click "Network"

# ⚠️ UI changed - new screen loaded
$ agent-native snapshot "System Settings" --interactive
$ agent-native click @n3  # Now this refers to a button on the Network pane

After dialogs open

Dialogs, sheets, and popovers add new elements:

$ agent-native snapshot Safari --interactive
$ agent-native key Cmd+O  # Open file dialog

# ⚠️ Dialog appeared
$ agent-native snapshot Safari --interactive
# Now refs include the file picker elements

After content loads

Dynamic content loading in browsers or document apps:

$ agent-native snapshot Safari --interactive
$ agent-native click @n3  # Click "Go"
$ agent-native wait 2      # Wait for page load

# ⚠️ Page content changed
$ agent-native snapshot Safari --interactive

When refs fail to resolve

If you get an error like this:

$ agent-native click @n8
Error: Could not re-resolve @n8. The UI may have changed -- run `snapshot` again.

Solution: Take a new snapshot. The element either moved, changed attributes, or was removed.

When in doubt, snapshot. Extra snapshots don’t hurt, but stale refs will cause errors.

Working with dynamic UIs

Dynamic interfaces require careful timing:

Wait for changes to settle

After triggering an action, wait before snapshotting:

$ agent-native click @n5
$ agent-native wait 1.5  # Wait for animation/load
$ agent-native snapshot MyApp --interactive

The wait command (from WaitCommand.swift) simply sleeps for the specified duration.

Check element state

Use get to read element values without re-snapshotting:

$ agent-native get @n2 --attr value
{"value": "example.com"}

Or use is to check boolean conditions:

$ agent-native is @n4 --focused
true

Use —interactive for cleaner snapshots

Large apps have hundreds of structural elements. Focus on what matters:

$ agent-native snapshot "Final Cut Pro" --interactive --compact

This filters to only interactive elements and removes empty containers. From SnapshotCommand.swift:66-74:

if interactive && !Self.interactiveRoles.contains(node.role) {
    continue  // Skip non-interactive elements
}
if compact && node.title == nil && node.label == nil
    && node.value == nil && node.actions.isEmpty
    && node.childCount > 0
{
    continue  // Skip empty structural containers
}

Element resolution strategies

You don’t always need refs. agent-native supports two resolution strategies:

Ref-based (snapshot first)

$ agent-native snapshot Safari --interactive
$ agent-native click @n3

Pros:

Fast interactions after initial snapshot
You see exactly what you’re clicking
Refs work across multiple commands

Cons:

Requires a snapshot step
Refs become stale when UI changes

Filter-based (direct search)

$ agent-native click Safari --role button --title "Back"

From ElementResolver.swift:6-52, this searches the tree on-demand using filters:

static func resolve(
    app: String?,
    ref: String?,
    role: String?,
    title: String?,
    label: String?,
    identifier: String?,
    index: Int = 0
) throws -> (element: AXUIElement, node: AXNode, appName: String)

Pros:

No snapshot needed
Works with dynamic UIs where refs would be stale
Good for one-off commands

Cons:

Slower (searches the tree every time)
Less visibility into what exists
Ambiguous if multiple elements match

For agent workflows, prefer ref-based resolution. It’s faster and gives your agent visibility into the full UI context.

Best practices

1. Start with an interactive snapshot

$ agent-native snapshot MyApp --interactive

This gives you a clean, actionable view of the UI.

2. Use wait after UI-changing actions

$ agent-native click @n7
$ agent-native wait 1  # Let the UI settle
$ agent-native snapshot MyApp --interactive

Don’t snapshot too soon—animations and loads need time.

3. Re-snapshot liberally

Whenever you’re unsure if the UI has changed, just snapshot again:

$ agent-native snapshot MyApp --interactive

Snapshots are cheap and prevent ref resolution errors.

4. Use —compact for complex UIs

$ agent-native snapshot "Adobe Photoshop" --interactive --compact

This removes noise and focuses on actionable elements.

5. Check snapshots in JSON for parsing

$ agent-native snapshot Safari --interactive --json > snapshot.json

JSON output is structured and easy to parse for agents:

[
  {
    "ref": "n1",
    "role": "AXWindow",
    "title": "Safari",
    "enabled": true,
    "actions": ["AXRaise"],
    "depth": 0
  },
  {
    "ref": "n2",
    "role": "AXTextField",
    "value": "example.com",
    "enabled": true,
    "actions": ["AXConfirm"],
    "depth": 1
  }
]

6. Handle ref resolution failures gracefully

Always be ready to re-snapshot if a ref fails:

try:
    run(["agent-native", "click", "@n5"])
except RefNotFoundError:
    # Ref stale, refresh and retry
    run(["agent-native", "snapshot", "MyApp", "--interactive"])
    # Parse new snapshot to find the element again

Example: Multi-step workflow

Here’s a complete example of opening a file in TextEdit:

Open the app

$ agent-native open TextEdit
Opened: TextEdit (pid 1234)

Snapshot to see what's available

$ agent-native snapshot TextEdit --interactive
Snapshot: TextEdit (pid 1234) -- 6 elements
AXMenuBar [ref=n1]
  AXMenuBarItem "File" [ref=n2]
AXWindow "Untitled" [ref=n3]
  AXTextArea [ref=n4]

Click the File menu

$ agent-native click @n2
OK Clicked: AXMenuBarItem title="File"

Re-snapshot to see the menu

$ agent-native snapshot TextEdit --interactive
Snapshot: TextEdit (pid 1234) -- 12 elements
AXMenuBar [ref=n1]
  AXMenuBarItem "File" [ref=n2]
    AXMenu "File" [ref=n3]
      AXMenuItem "Open..." [ref=n4]
      AXMenuItem "Save" [ref=n5]

Click 'Open...'

$ agent-native click @n4
OK Clicked: AXMenuItem title="Open..."

Wait for the file dialog

$ agent-native wait 1

Snapshot the file picker

$ agent-native snapshot TextEdit --interactive
Snapshot: TextEdit (pid 1234) -- 23 elements
# Now includes file picker buttons, text fields, etc.

Notice the pattern: snapshot → interact → wait → snapshot → interact. This is how you navigate complex UIs reliably.

Accessibility tree

Understanding the tree structure

Refs and snapshots

Deep dive into the ref system

Get Started

Core Concepts

Commands

AI Integration

Guides

Reference

The core pattern

When to re-snapshot

After navigation

After dialogs open

After content loads

When refs fail to resolve

Working with dynamic UIs

Wait for changes to settle

Check element state

Use —interactive for cleaner snapshots

Element resolution strategies

Ref-based (snapshot first)

Filter-based (direct search)

Best practices

1. Start with an interactive snapshot

2. Use wait after UI-changing actions

3. Re-snapshot liberally

4. Use —compact for complex UIs

5. Check snapshots in JSON for parsing

6. Handle ref resolution failures gracefully

Example: Multi-step workflow

See also

Accessibility tree

Refs and snapshots

Build docs developers (and LLMs) love

Get Started

Core Concepts

Commands

AI Integration

Guides

Reference

​The core pattern

​When to re-snapshot

​After navigation

​After dialogs open

​After content loads

​When refs fail to resolve

​Working with dynamic UIs

​Wait for changes to settle

​Check element state

​Use —interactive for cleaner snapshots

​Element resolution strategies

​Ref-based (snapshot first)

​Filter-based (direct search)

​Best practices

​1. Start with an interactive snapshot

​2. Use wait after UI-changing actions

​3. Re-snapshot liberally

​4. Use —compact for complex UIs

​5. Check snapshots in JSON for parsing

​6. Handle ref resolution failures gracefully

​Example: Multi-step workflow

​See also

Accessibility tree

Refs and snapshots

Build docs developers (and LLMs) love

The core pattern

When to re-snapshot

After navigation

After dialogs open

After content loads

When refs fail to resolve

Working with dynamic UIs

Wait for changes to settle

Check element state

Use —interactive for cleaner snapshots

Element resolution strategies

Ref-based (snapshot first)

Filter-based (direct search)

Best practices

1. Start with an interactive snapshot

2. Use wait after UI-changing actions

3. Re-snapshot liberally

4. Use —compact for complex UIs

5. Check snapshots in JSON for parsing

6. Handle ref resolution failures gracefully

Example: Multi-step workflow

See also