Skip to main content

Core principles

When building AI agents that use agent-native, follow these principles:
  1. Always re-snapshot after UI changes
  2. Use interactive mode by default (-i flag)
  3. Handle errors gracefully and retry with context
  4. Fall back to keyboard when AX tree is sparse
  5. Verify state after critical operations
The most common mistake is forgetting to re-snapshot after clicking, navigating, or changing state. Old refs may not resolve after UI structure changes.

Always re-snapshot after UI changes

Why re-snapshotting matters

Refs from snapshot are stable identifiers tied to the current UI structure. When the UI changes:
  • New elements appear (modals, sheets, panels)
  • Elements are removed (closed dialogs, hidden sections)
  • Element hierarchy changes (expanded trees, navigated views)
Old refs may:
  • Point to elements that no longer exist
  • Point to elements with different attributes
  • Fail to resolve entirely

When to re-snapshot

Re-snapshot after any action that changes the UI:
1

After navigation

# Click a navigation button
run_command(["agent-native", "click", "@n3"])
time.sleep(0.5)  # Wait for transition

# Re-snapshot to get new view's elements
new_snapshot = snapshot_app("System Settings", interactive_only=True)
2

After opening dialogs/sheets

# Click button that opens a dialog
run_command(["agent-native", "click", "@n5"])

# Wait for dialog to appear
run_command(["agent-native", "wait", "System Settings", "--role", "AXSheet", "--timeout", "5"])

# Re-snapshot to get dialog elements
snapshot = snapshot_app("System Settings", interactive_only=True)
3

After state changes

# Toggle a setting
run_command(["agent-native", "check", "@n7"])

# Re-snapshot to see if new options appeared
snapshot = snapshot_app("System Settings", interactive_only=True)

# Some settings reveal additional controls when enabled
4

After form submission

# Fill form fields
run_command(["agent-native", "fill", "@n2", "username"])
run_command(["agent-native", "fill", "@n3", "password"])

# Submit
run_command(["agent-native", "click", "@n4"])

# Wait and re-snapshot for success/error state
time.sleep(1)
snapshot = snapshot_app("MyApp", interactive_only=True)

When NOT to re-snapshot

You can skip re-snapshotting for:
  • Reading state without changing it (get text, is enabled)
  • Multiple interactions on the same view without navigation
  • Typing text in a single field
  • Taking screenshots
# These don't require re-snapshot
text = run_command(["agent-native", "get", "text", "@n1"])
enabled = run_command(["agent-native", "is", "enabled", "@n2"])
run_command(["agent-native", "screenshot", "MyApp", "/tmp/screen.png"])

# This sequence doesn't change UI structure
run_command(["agent-native", "fill", "@n3", "first part"])
run_command(["agent-native", "type", "@n3", " second part"])  # Appends to same field

Use interactive mode by default

Why -i matters

The full AX tree contains hundreds of structural elements (groups, static text, images) that aren’t interactive. For AI agents:
  • Too much noise makes LLMs less effective at finding targets
  • Longer context consumes more tokens
  • Slower processing from parsing large trees
The -i flag filters to only interactive elements:
  • Buttons, text fields, checkboxes, links, sliders, etc.
  • Elements that have actions like AXPress, AXConfirm
# ❌ Without -i: 200+ elements including static text, images, groups
agent-native snapshot "System Settings" --json

# ✅ With -i: 15-20 interactive elements
agent-native snapshot "System Settings" -i --json

Always use -i unless…

Only omit -i when:
  • Debugging why an element isn’t appearing
  • Reading static content like labels or error messages
  • Exploring an unfamiliar app’s structure
For production agent workflows, always use -i.

Combine with -c for even cleaner output

agent-native snapshot "System Settings" -i -c --json
The -c (compact) flag removes empty structural elements that have no content or actions.

Handle errors gracefully

Common error scenarios

Cause: UI structure changed, element removed, or never snapshotted.Solution: Re-snapshot and find element by attributes.
try:
    run_command(["agent-native", "click", "@n5"])
except subprocess.CalledProcessError:
    # Ref no longer valid, re-snapshot
    snapshot = snapshot_app("MyApp", interactive_only=True)
    
    # Find element by attributes instead
    target = next(
        el for el in snapshot
        if el["role"] == "AXButton" and "Submit" in el.get("title", "")
    )
    
    run_command(["agent-native", "click", target["ref"]])
Cause: Element is disabled (grayed out) or not yet ready.Solution: Wait or check prerequisites.
# Check if enabled first
enabled = run_command(["agent-native", "is", "enabled", "@n3"])
if enabled.strip() == "false":
    print("Element is disabled, checking prerequisites...")
    # Maybe another field needs to be filled first
Cause: App not running, wrong name, or not launched yet.Solution: Open the app first, retry with fuzzy matching.
try:
    snapshot = snapshot_app("System Settings", interactive_only=True)
except subprocess.CalledProcessError as e:
    if "not found" in e.stderr:
        # Launch the app
        run_command(["agent-native", "open", "System Settings"])
        time.sleep(2)
        # Retry
        snapshot = snapshot_app("System Settings", interactive_only=True)
Cause: Element took longer to appear than expected.Solution: Increase timeout, check if navigation succeeded.
try:
    run_command([
        "agent-native", "wait", "MyApp",
        "--role", "AXButton",
        "--title", "OK",
        "--timeout", "5"
    ])
except subprocess.CalledProcessError:
    # Element didn't appear, check why
    snapshot = snapshot_app("MyApp", interactive_only=True)
    # Maybe there's an error dialog instead?
    error_msg = next(
        (el for el in snapshot if "error" in el.get("title", "").lower()),
        None
    )
    if error_msg:
        print(f"Error occurred: {error_msg['title']}")

Retry with exponential backoff

For transient failures (network, slow UI), retry with increasing delays:
import time

def retry_command(cmd: list[str], max_attempts: int = 3) -> str:
    """Retry command with exponential backoff."""
    for attempt in range(max_attempts):
        try:
            return run_command(cmd)
        except subprocess.CalledProcessError as e:
            if attempt == max_attempts - 1:
                raise
            delay = 2 ** attempt  # 1s, 2s, 4s
            print(f"Attempt {attempt + 1} failed, retrying in {delay}s...")
            time.sleep(delay)

When to use keyboard vs AX tree

Prefer AX tree when possible

The AX tree is more reliable and semantic:
  • Semantic understanding: Know what element you’re interacting with
  • State validation: Check if element is enabled, focused, etc.
  • Precise targeting: No guessing about key sequences
  • Cross-version compatibility: Less brittle than keyboard shortcuts

Use keyboard for Electron apps

Electron apps (Slack, Discord, VS Code, etc.) expose minimal AX trees. When snapshot -i returns very few elements:
def interact_with_slack(channel: str, message: str):
    """Post message to Slack channel using keyboard shortcuts."""
    
    # Try AX tree first
    snapshot = snapshot_app("Slack", interactive_only=True)
    
    if len(snapshot) < 5:
        print("Sparse AX tree, using keyboard shortcuts")
        
        # Open quick switcher (Cmd+K)
        run_command(["agent-native", "key", "Slack", "cmd+k"])
        time.sleep(0.3)
        
        # Type channel name and press Enter
        run_command(["agent-native", "key", "Slack", channel, "return"])
        time.sleep(0.5)
        
        # Type message and send
        run_command(["agent-native", "key", "Slack", message, "return"])
    else:
        # Use AX tree if available
        # ...

Common keyboard patterns

Slack

cmd+k        # Quick switcher
cmd+n        # New DM
cmd+u        # Upload file
cmd+shift+a  # All unreads

VS Code

cmd+p        # Quick open
cmd+shift+p  # Command palette
cmd+b        # Toggle sidebar
cmd+j        # Toggle panel

Discord

cmd+k        # Quick switcher
cmd+/        # Search
cmd+i        # Toggle inbox

Safari

cmd+l        # Focus address bar
cmd+t        # New tab
cmd+w        # Close tab
cmd+r        # Reload

Combine both approaches

def automate_app(app_name: str, task: str):
    """Adaptive approach: try AX tree, fall back to keyboard."""
    
    snapshot = snapshot_app(app_name, interactive_only=True)
    
    # If tree is rich enough, use it
    if len(snapshot) > 10:
        print("Using AX tree for semantic interaction")
        return use_ax_tree(app_name, task, snapshot)
    
    # Otherwise, fall back to keyboard
    print("Sparse AX tree, falling back to keyboard shortcuts")
    return use_keyboard_shortcuts(app_name, task)

Performance tips

Limit snapshot depth

Deeper trees take longer to walk and parse:
# Default depth (8) is usually enough
agent-native snapshot "MyApp" -i --json

# Increase for web content
agent-native snapshot Safari -i -d 12 --json

# Decrease for faster snapshots
agent-native snapshot "MyApp" -i -d 5 --json

Use wait instead of sleep

# ❌ Fixed sleep wastes time
run_command(["agent-native", "click", "@n3"])
time.sleep(2)  # Might be too short or too long

# ✅ Wait for specific element
run_command(["agent-native", "click", "@n3"])
run_command([
    "agent-native", "wait", "MyApp",
    "--role", "AXSheet",
    "--timeout", "5"
])
# Proceeds as soon as element appears

Cache snapshots when possible

If making multiple queries on the same view:
# ❌ Multiple snapshots
for task in tasks:
    snapshot = snapshot_app("MyApp", interactive_only=True)
    element = find_element(snapshot, task)
    interact(element)

# ✅ Cache and reuse
snapshot = snapshot_app("MyApp", interactive_only=True)
for task in tasks:
    element = find_element(snapshot, task)
    interact(element)
    # Only re-snapshot if interaction changes UI

Batch independent operations

# ❌ Sequential with unnecessary waits
run_command(["agent-native", "fill", "@n1", "value1"])
time.sleep(0.1)
run_command(["agent-native", "fill", "@n2", "value2"])
time.sleep(0.1)
run_command(["agent-native", "fill", "@n3", "value3"])

# ✅ Batch without delays
for ref, value in [("@n1", "value1"), ("@n2", "value2"), ("@n3", "value3")]:
    run_command(["agent-native", "fill", ref, value])

Verify state after critical operations

For important operations, verify success:
def toggle_wifi(enable: bool):
    """Toggle Wi-Fi and verify state."""
    
    # Navigate to Wi-Fi settings
    run_command(["agent-native", "open", "System Settings"])
    time.sleep(1)
    
    snapshot = snapshot_app("System Settings", interactive_only=True)
    wifi_button = next(el for el in snapshot if "Wi-Fi" in el.get("title", ""))
    run_command(["agent-native", "click", wifi_button["ref"]])
    
    time.sleep(1)
    
    # Find and toggle
    snapshot = snapshot_app("System Settings", interactive_only=True)
    wifi_toggle = next(
        el for el in snapshot
        if el["role"] == "AXCheckBox" and el.get("title") == "Wi-Fi"
    )
    
    if enable:
        run_command(["agent-native", "check", wifi_toggle["ref"]])
    else:
        run_command(["agent-native", "uncheck", wifi_toggle["ref"]])
    
    # Verify state
    time.sleep(0.5)
    result = run_command(["agent-native", "get", "value", wifi_toggle["ref"]])
    actual_state = result.strip() == "1"
    
    if actual_state != enable:
        raise ValueError(f"Failed to {'enable' if enable else 'disable'} Wi-Fi")
    
    print(f"✓ Wi-Fi {'enabled' if enable else 'disabled'}")

Use screenshots for visual context

When the AX tree doesn’t provide enough information:
def diagnose_ui_state(app_name: str):
    """Capture both AX tree and screenshot for debugging."""
    
    # Get structured data
    snapshot = snapshot_app(app_name, interactive_only=True)
    
    # Get visual context
    screenshot_result = json.loads(
        run_command(["agent-native", "screenshot", app_name, "--json"])
    )
    
    # If using vision model
    if vision_model_available():
        with open(screenshot_result["path"], "rb") as f:
            image_bytes = f.read()
        
        # Send to vision model for analysis
        visual_analysis = analyze_screenshot(image_bytes)
        
        # Combine AX tree + visual analysis
        return {
            "ax_tree": snapshot,
            "visual": visual_analysis,
            "screenshot": screenshot_result["path"]
        }
    
    return {"ax_tree": snapshot, "screenshot": screenshot_result["path"]}
Screenshots are especially useful for Electron apps, custom controls, and visual confirmation of state.

Next steps

JSON output reference

Complete reference for all JSON output formats

OpenCode skill

Install the pre-built skill for instant integration

Build docs developers (and LLMs) love