Skip to main content

Why agent-native is built for AI

agent-native is specifically designed for AI agent workflows. Unlike traditional automation tools built for human use, agent-native provides:
  • Structured, parseable output with JSON format for every command
  • Stable refs (@n1, @n2) that AI agents can track between operations
  • Self-contained commands that work in stateless environments
  • Rich element metadata including roles, labels, actions, and accessibility attributes
  • Predictable workflow patterns that match how LLMs reason about UI automation
Inspired by agent-browser from Vercel Labs, agent-native brings the same agent-first design philosophy to macOS native applications.
The core workflow (snapshot → interact → re-snapshot) mirrors how AI agents naturally break down UI automation tasks.

Integration patterns

There are three main ways to integrate agent-native with AI agents:

1. Tool calling / function calling

Map each agent-native command to a function/tool in your LLM framework:
Python (OpenAI)
import subprocess
import json

def snapshot_app(app_name: str, interactive_only: bool = True) -> list:
    """Get interactive elements from an app with refs."""
    cmd = ["agent-native", "snapshot", app_name, "--json"]
    if interactive_only:
        cmd.append("-i")
    result = subprocess.run(cmd, capture_output=True, text=True)
    return json.loads(result.stdout)

def click_element(ref: str) -> dict:
    """Click an element by ref."""
    cmd = ["agent-native", "click", ref, "--json"]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return json.loads(result.stdout)

# Register as tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "snapshot_app",
            "description": "Get interactive elements from a macOS app",
            "parameters": {
                "type": "object",
                "properties": {
                    "app_name": {"type": "string"},
                    "interactive_only": {"type": "boolean"}
                },
                "required": ["app_name"]
            }
        }
    }
]
TypeScript (Vercel AI SDK)
import { tool } from 'ai';
import { z } from 'zod';
import { execSync } from 'child_process';

const snapshotApp = tool({
  description: 'Get interactive elements from a macOS app',
  parameters: z.object({
    appName: z.string().describe('The app name'),
    interactiveOnly: z.boolean().default(true),
  }),
  execute: async ({ appName, interactiveOnly }) => {
    const cmd = ['agent-native', 'snapshot', appName, '--json'];
    if (interactiveOnly) cmd.push('-i');
    const output = execSync(cmd.join(' ')).toString();
    return JSON.parse(output);
  },
});

2. Direct shell commands

For agents with shell access (like OpenCode, Aider, Claude Code):
Instructions
Use the `agent-native` CLI to control macOS apps.

Workflow:
1. `agent-native open <app>` - Launch the app
2. `agent-native snapshot <app> -i --json` - Get interactive elements
3. Parse the JSON to find target elements by their `ref` field
4. `agent-native click @ref` or `agent-native fill @ref "text"` - Interact
5. Re-snapshot after UI changes

Always use `--json` flag for structured output.

3. MCP (Model Context Protocol) server

Create an MCP server that wraps agent-native commands:
MCP server example
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { CallToolRequestSchema } from '@modelcontextprotocol/sdk/types.js';
import { execSync } from 'child_process';

const server = new Server({
  name: 'agent-native-mcp',
  version: '0.1.0',
});

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params;
  
  if (name === 'snapshot') {
    const cmd = `agent-native snapshot "${args.app}" -i --json`;
    const output = execSync(cmd).toString();
    return {
      content: [{ type: 'text', text: output }],
    };
  }
  
  if (name === 'click') {
    const cmd = `agent-native click ${args.ref} --json`;
    const output = execSync(cmd).toString();
    return {
      content: [{ type: 'text', text: output }],
    };
  }
  
  // ... other tools
});

Example workflows

Toggle Wi-Fi in System Settings

# 1. Open the app
run_command(["agent-native", "open", "System Settings"])

# 2. Get interactive elements
snapshot = snapshot_app("System Settings", interactive_only=True)

# 3. Find the Wi-Fi button
wifi_button = next(
    el for el in snapshot 
    if el["role"] == "AXButton" and "Wi-Fi" in el.get("title", "")
)

# 4. Click to navigate to Wi-Fi pane
click_element(wifi_button["ref"])

# 5. Wait for pane to load
time.sleep(1)

# 6. Re-snapshot to get Wi-Fi toggle
snapshot = snapshot_app("System Settings", interactive_only=True)

# 7. Find the Wi-Fi checkbox
wifi_toggle = next(
    el for el in snapshot
    if el["role"] == "AXCheckBox" and el.get("title") == "Wi-Fi"
)

# 8. Toggle it
if wifi_toggle.get("value") == "1":
    run_command(["agent-native", "uncheck", wifi_toggle["ref"]])
else:
    run_command(["agent-native", "check", wifi_toggle["ref"]])

Search and message in Slack

# For Electron apps like Slack, use keyboard shortcuts
# since the AX tree is often sparse

# 1. Open Slack
run_command(["agent-native", "open", "Slack"])

# 2. Open quick switcher (Cmd+K)
run_command(["agent-native", "key", "Slack", "cmd+k"])

# 3. Type channel name and press Enter
run_command(["agent-native", "key", "Slack", "general", "return"])

# 4. Type message
run_command(["agent-native", "key", "Slack", "Hello from agent-native!"])

# 5. Send (Enter)
run_command(["agent-native", "key", "Slack", "return"])

Fill a web form in Safari

# agent-native can interact with web content through the AX tree

# 1. Open Safari
run_command(["agent-native", "open", "Safari"])

# 2. Snapshot with increased depth to reach web content
snapshot = json.loads(
    run_command(["agent-native", "snapshot", "Safari", "-i", "--json", "-d", "10"])
)

# 3. Find address bar
address_bar = next(
    el for el in snapshot
    if el["role"] == "AXTextField" and "address" in el.get("label", "").lower()
)

# 4. Navigate to URL
run_command(["agent-native", "fill", address_bar["ref"], "https://example.com"])
run_command(["agent-native", "key", "Safari", "return"])

# 5. Wait for page load
time.sleep(2)

# 6. Re-snapshot to get web form elements
snapshot = json.loads(
    run_command(["agent-native", "snapshot", "Safari", "-i", "--json", "-d", "12"])
)

# 7. Find and fill form fields
email_field = next(
    el for el in snapshot
    if el["role"] == "AXTextField" and "email" in el.get("label", "").lower()
)
run_command(["agent-native", "fill", email_field["ref"], "[email protected]"])
Always re-snapshot after UI navigation or state changes. Refs from old snapshots may not resolve correctly after the UI structure changes.

Multi-step automation

AI agents excel at breaking down complex tasks into steps:
def automate_system_settings_change(setting_path: list[str], value: str):
    """
    Navigate System Settings hierarchy and change a value.
    
    Args:
        setting_path: List of navigation steps, e.g. ["Wi-Fi", "Advanced"]
        value: The value to set
    """
    # Open System Settings
    run_command(["agent-native", "open", "System Settings"])
    time.sleep(1)
    
    # Navigate through each level
    for step in setting_path:
        snapshot = snapshot_app("System Settings", interactive_only=True)
        
        # Find button or link with matching title
        target = next(
            (el for el in snapshot if step.lower() in el.get("title", "").lower()),
            None
        )
        
        if not target:
            raise ValueError(f"Could not find '{step}' in current view")
        
        click_element(target["ref"])
        time.sleep(1)
    
    # Now we're at the target pane, find the setting and change it
    snapshot = snapshot_app("System Settings", interactive_only=True)
    # ... interact with the setting
The LLM can reason about the navigation hierarchy and dynamically adjust the path if the UI doesn’t match expectations.

Handling uncertainty

AI agents should handle cases where the AX tree doesn’t provide enough information:
def interact_with_app(app_name: str, task: str):
    """Try AX tree first, fall back to keyboard/screenshot."""
    
    # 1. Try snapshot
    snapshot = snapshot_app(app_name, interactive_only=True)
    
    # 2. Check if we got useful elements
    if len(snapshot) < 3:  # Very sparse tree
        print(f"Sparse AX tree for {app_name}, using keyboard shortcuts")
        # Fall back to keyboard commands
        # Use known shortcuts or ask user for guidance
        return use_keyboard_shortcuts(app_name, task)
    
    # 3. If needed, take a screenshot for visual context
    screenshot_path = run_command(
        ["agent-native", "screenshot", app_name, "--json"]
    )
    # Send screenshot to vision model for additional context
    # ...

Best practices

For detailed guidance on using agent-native effectively with AI agents, see:

JSON output mode

Learn about structured output formats for each command

Best practices

Essential patterns for reliable AI automation

OpenCode skill

Install the pre-built OpenCode skill for instant integration

Build docs developers (and LLMs) love