When building AI agents that use agent-native, follow these principles:
Always re-snapshot after UI changes
Use interactive mode by default (-i flag)
Handle errors gracefully and retry with context
Fall back to keyboard when AX tree is sparse
Verify state after critical operations
The most common mistake is forgetting to re-snapshot after clicking, navigating, or changing state. Old refs may not resolve after UI structure changes.
# Click a navigation buttonrun_command(["agent-native", "click", "@n3"])time.sleep(0.5) # Wait for transition# Re-snapshot to get new view's elementsnew_snapshot = snapshot_app("System Settings", interactive_only=True)
2
After opening dialogs/sheets
# Click button that opens a dialogrun_command(["agent-native", "click", "@n5"])# Wait for dialog to appearrun_command(["agent-native", "wait", "System Settings", "--role", "AXSheet", "--timeout", "5"])# Re-snapshot to get dialog elementssnapshot = snapshot_app("System Settings", interactive_only=True)
3
After state changes
# Toggle a settingrun_command(["agent-native", "check", "@n7"])# Re-snapshot to see if new options appearedsnapshot = snapshot_app("System Settings", interactive_only=True)# Some settings reveal additional controls when enabled
4
After form submission
# Fill form fieldsrun_command(["agent-native", "fill", "@n2", "username"])run_command(["agent-native", "fill", "@n3", "password"])# Submitrun_command(["agent-native", "click", "@n4"])# Wait and re-snapshot for success/error statetime.sleep(1)snapshot = snapshot_app("MyApp", interactive_only=True)
Cause: UI structure changed, element removed, or never snapshotted.Solution: Re-snapshot and find element by attributes.
try: run_command(["agent-native", "click", "@n5"])except subprocess.CalledProcessError: # Ref no longer valid, re-snapshot snapshot = snapshot_app("MyApp", interactive_only=True) # Find element by attributes instead target = next( el for el in snapshot if el["role"] == "AXButton" and "Submit" in el.get("title", "") ) run_command(["agent-native", "click", target["ref"]])
Element not enabled
Cause: Element is disabled (grayed out) or not yet ready.Solution: Wait or check prerequisites.
# Check if enabled firstenabled = run_command(["agent-native", "is", "enabled", "@n3"])if enabled.strip() == "false": print("Element is disabled, checking prerequisites...") # Maybe another field needs to be filled first
App not found
Cause: App not running, wrong name, or not launched yet.Solution: Open the app first, retry with fuzzy matching.
try: snapshot = snapshot_app("System Settings", interactive_only=True)except subprocess.CalledProcessError as e: if "not found" in e.stderr: # Launch the app run_command(["agent-native", "open", "System Settings"]) time.sleep(2) # Retry snapshot = snapshot_app("System Settings", interactive_only=True)
Timeout waiting for element
Cause: Element took longer to appear than expected.Solution: Increase timeout, check if navigation succeeded.
try: run_command([ "agent-native", "wait", "MyApp", "--role", "AXButton", "--title", "OK", "--timeout", "5" ])except subprocess.CalledProcessError: # Element didn't appear, check why snapshot = snapshot_app("MyApp", interactive_only=True) # Maybe there's an error dialog instead? error_msg = next( (el for el in snapshot if "error" in el.get("title", "").lower()), None ) if error_msg: print(f"Error occurred: {error_msg['title']}")
Electron apps (Slack, Discord, VS Code, etc.) expose minimal AX trees. When snapshot -i returns very few elements:
def interact_with_slack(channel: str, message: str): """Post message to Slack channel using keyboard shortcuts.""" # Try AX tree first snapshot = snapshot_app("Slack", interactive_only=True) if len(snapshot) < 5: print("Sparse AX tree, using keyboard shortcuts") # Open quick switcher (Cmd+K) run_command(["agent-native", "key", "Slack", "cmd+k"]) time.sleep(0.3) # Type channel name and press Enter run_command(["agent-native", "key", "Slack", channel, "return"]) time.sleep(0.5) # Type message and send run_command(["agent-native", "key", "Slack", message, "return"]) else: # Use AX tree if available # ...
def automate_app(app_name: str, task: str): """Adaptive approach: try AX tree, fall back to keyboard.""" snapshot = snapshot_app(app_name, interactive_only=True) # If tree is rich enough, use it if len(snapshot) > 10: print("Using AX tree for semantic interaction") return use_ax_tree(app_name, task, snapshot) # Otherwise, fall back to keyboard print("Sparse AX tree, falling back to keyboard shortcuts") return use_keyboard_shortcuts(app_name, task)
# ❌ Fixed sleep wastes timerun_command(["agent-native", "click", "@n3"])time.sleep(2) # Might be too short or too long# ✅ Wait for specific elementrun_command(["agent-native", "click", "@n3"])run_command([ "agent-native", "wait", "MyApp", "--role", "AXSheet", "--timeout", "5"])# Proceeds as soon as element appears
When the AX tree doesn’t provide enough information:
def diagnose_ui_state(app_name: str): """Capture both AX tree and screenshot for debugging.""" # Get structured data snapshot = snapshot_app(app_name, interactive_only=True) # Get visual context screenshot_result = json.loads( run_command(["agent-native", "screenshot", app_name, "--json"]) ) # If using vision model if vision_model_available(): with open(screenshot_result["path"], "rb") as f: image_bytes = f.read() # Send to vision model for analysis visual_analysis = analyze_screenshot(image_bytes) # Combine AX tree + visual analysis return { "ax_tree": snapshot, "visual": visual_analysis, "screenshot": screenshot_result["path"] } return {"ax_tree": snapshot, "screenshot": screenshot_result["path"]}
Screenshots are especially useful for Electron apps, custom controls, and visual confirmation of state.