For agents with shell access (like OpenCode, Aider, Claude Code):
Instructions
Use the `agent-native` CLI to control macOS apps.Workflow:1. `agent-native open <app>` - Launch the app2. `agent-native snapshot <app> -i --json` - Get interactive elements3. Parse the JSON to find target elements by their `ref` field4. `agent-native click @ref` or `agent-native fill @ref "text"` - Interact5. Re-snapshot after UI changesAlways use `--json` flag for structured output.
# 1. Open the apprun_command(["agent-native", "open", "System Settings"])# 2. Get interactive elementssnapshot = snapshot_app("System Settings", interactive_only=True)# 3. Find the Wi-Fi buttonwifi_button = next( el for el in snapshot if el["role"] == "AXButton" and "Wi-Fi" in el.get("title", ""))# 4. Click to navigate to Wi-Fi paneclick_element(wifi_button["ref"])# 5. Wait for pane to loadtime.sleep(1)# 6. Re-snapshot to get Wi-Fi togglesnapshot = snapshot_app("System Settings", interactive_only=True)# 7. Find the Wi-Fi checkboxwifi_toggle = next( el for el in snapshot if el["role"] == "AXCheckBox" and el.get("title") == "Wi-Fi")# 8. Toggle itif wifi_toggle.get("value") == "1": run_command(["agent-native", "uncheck", wifi_toggle["ref"]])else: run_command(["agent-native", "check", wifi_toggle["ref"]])
# For Electron apps like Slack, use keyboard shortcuts# since the AX tree is often sparse# 1. Open Slackrun_command(["agent-native", "open", "Slack"])# 2. Open quick switcher (Cmd+K)run_command(["agent-native", "key", "Slack", "cmd+k"])# 3. Type channel name and press Enterrun_command(["agent-native", "key", "Slack", "general", "return"])# 4. Type messagerun_command(["agent-native", "key", "Slack", "Hello from agent-native!"])# 5. Send (Enter)run_command(["agent-native", "key", "Slack", "return"])
# agent-native can interact with web content through the AX tree# 1. Open Safarirun_command(["agent-native", "open", "Safari"])# 2. Snapshot with increased depth to reach web contentsnapshot = json.loads( run_command(["agent-native", "snapshot", "Safari", "-i", "--json", "-d", "10"]))# 3. Find address baraddress_bar = next( el for el in snapshot if el["role"] == "AXTextField" and "address" in el.get("label", "").lower())# 4. Navigate to URLrun_command(["agent-native", "fill", address_bar["ref"], "https://example.com"])run_command(["agent-native", "key", "Safari", "return"])# 5. Wait for page loadtime.sleep(2)# 6. Re-snapshot to get web form elementssnapshot = json.loads( run_command(["agent-native", "snapshot", "Safari", "-i", "--json", "-d", "12"]))# 7. Find and fill form fieldsemail_field = next( el for el in snapshot if el["role"] == "AXTextField" and "email" in el.get("label", "").lower())run_command(["agent-native", "fill", email_field["ref"], "[email protected]"])
Always re-snapshot after UI navigation or state changes. Refs from old snapshots may not resolve correctly after the UI structure changes.
AI agents excel at breaking down complex tasks into steps:
def automate_system_settings_change(setting_path: list[str], value: str): """ Navigate System Settings hierarchy and change a value. Args: setting_path: List of navigation steps, e.g. ["Wi-Fi", "Advanced"] value: The value to set """ # Open System Settings run_command(["agent-native", "open", "System Settings"]) time.sleep(1) # Navigate through each level for step in setting_path: snapshot = snapshot_app("System Settings", interactive_only=True) # Find button or link with matching title target = next( (el for el in snapshot if step.lower() in el.get("title", "").lower()), None ) if not target: raise ValueError(f"Could not find '{step}' in current view") click_element(target["ref"]) time.sleep(1) # Now we're at the target pane, find the setting and change it snapshot = snapshot_app("System Settings", interactive_only=True) # ... interact with the setting
The LLM can reason about the navigation hierarchy and dynamically adjust the path if the UI doesn’t match expectations.
AI agents should handle cases where the AX tree doesn’t provide enough information:
def interact_with_app(app_name: str, task: str): """Try AX tree first, fall back to keyboard/screenshot.""" # 1. Try snapshot snapshot = snapshot_app(app_name, interactive_only=True) # 2. Check if we got useful elements if len(snapshot) < 3: # Very sparse tree print(f"Sparse AX tree for {app_name}, using keyboard shortcuts") # Fall back to keyboard commands # Use known shortcuts or ask user for guidance return use_keyboard_shortcuts(app_name, task) # 3. If needed, take a screenshot for visual context screenshot_path = run_command( ["agent-native", "screenshot", app_name, "--json"] ) # Send screenshot to vision model for additional context # ...