AI integration overview

Why agent-native is built for AI

agent-native is specifically designed for AI agent workflows. Unlike traditional automation tools built for human use, agent-native provides:

Structured, parseable output with JSON format for every command
Stable refs (@n1, @n2) that AI agents can track between operations
Self-contained commands that work in stateless environments
Rich element metadata including roles, labels, actions, and accessibility attributes
Predictable workflow patterns that match how LLMs reason about UI automation

Inspired by agent-browser from Vercel Labs, agent-native brings the same agent-first design philosophy to macOS native applications.

The core workflow (snapshot → interact → re-snapshot) mirrors how AI agents naturally break down UI automation tasks.

Integration patterns

There are three main ways to integrate agent-native with AI agents:

1. Tool calling / function calling

Map each agent-native command to a function/tool in your LLM framework:

Python (OpenAI)

import subprocess
import json

def snapshot_app(app_name: str, interactive_only: bool = True) -> list:
    """Get interactive elements from an app with refs."""
    cmd = ["agent-native", "snapshot", app_name, "--json"]
    if interactive_only:
        cmd.append("-i")
    result = subprocess.run(cmd, capture_output=True, text=True)
    return json.loads(result.stdout)

def click_element(ref: str) -> dict:
    """Click an element by ref."""
    cmd = ["agent-native", "click", ref, "--json"]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return json.loads(result.stdout)

# Register as tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "snapshot_app",
            "description": "Get interactive elements from a macOS app",
            "parameters": {
                "type": "object",
                "properties": {
                    "app_name": {"type": "string"},
                    "interactive_only": {"type": "boolean"}
                },
                "required": ["app_name"]
            }
        }
    }
]

TypeScript (Vercel AI SDK)

import { tool } from 'ai';
import { z } from 'zod';
import { execSync } from 'child_process';

const snapshotApp = tool({
  description: 'Get interactive elements from a macOS app',
  parameters: z.object({
    appName: z.string().describe('The app name'),
    interactiveOnly: z.boolean().default(true),
  }),
  execute: async ({ appName, interactiveOnly }) => {
    const cmd = ['agent-native', 'snapshot', appName, '--json'];
    if (interactiveOnly) cmd.push('-i');
    const output = execSync(cmd.join(' ')).toString();
    return JSON.parse(output);
  },
});

2. Direct shell commands

For agents with shell access (like OpenCode, Aider, Claude Code):

Instructions

Use the `agent-native` CLI to control macOS apps.

Workflow:
1. `agent-native open <app>` - Launch the app
2. `agent-native snapshot <app> -i --json` - Get interactive elements
3. Parse the JSON to find target elements by their `ref` field
4. `agent-native click @ref` or `agent-native fill @ref "text"` - Interact
5. Re-snapshot after UI changes

Always use `--json` flag for structured output.

3. MCP (Model Context Protocol) server

Create an MCP server that wraps agent-native commands:

MCP server example

import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { CallToolRequestSchema } from '@modelcontextprotocol/sdk/types.js';
import { execSync } from 'child_process';

const server = new Server({
  name: 'agent-native-mcp',
  version: '0.1.0',
});

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params;
  
  if (name === 'snapshot') {
    const cmd = `agent-native snapshot "${args.app}" -i --json`;
    const output = execSync(cmd).toString();
    return {
      content: [{ type: 'text', text: output }],
    };
  }
  
  if (name === 'click') {
    const cmd = `agent-native click ${args.ref} --json`;
    const output = execSync(cmd).toString();
    return {
      content: [{ type: 'text', text: output }],
    };
  }
  
  // ... other tools
});

Example workflows

Toggle Wi-Fi in System Settings

# 1. Open the app
run_command(["agent-native", "open", "System Settings"])

# 2. Get interactive elements
snapshot = snapshot_app("System Settings", interactive_only=True)

# 3. Find the Wi-Fi button
wifi_button = next(
    el for el in snapshot 
    if el["role"] == "AXButton" and "Wi-Fi" in el.get("title", "")
)

# 4. Click to navigate to Wi-Fi pane
click_element(wifi_button["ref"])

# 5. Wait for pane to load
time.sleep(1)

# 6. Re-snapshot to get Wi-Fi toggle
snapshot = snapshot_app("System Settings", interactive_only=True)

# 7. Find the Wi-Fi checkbox
wifi_toggle = next(
    el for el in snapshot
    if el["role"] == "AXCheckBox" and el.get("title") == "Wi-Fi"
)

# 8. Toggle it
if wifi_toggle.get("value") == "1":
    run_command(["agent-native", "uncheck", wifi_toggle["ref"]])
else:
    run_command(["agent-native", "check", wifi_toggle["ref"]])

Search and message in Slack

# For Electron apps like Slack, use keyboard shortcuts
# since the AX tree is often sparse

# 1. Open Slack
run_command(["agent-native", "open", "Slack"])

# 2. Open quick switcher (Cmd+K)
run_command(["agent-native", "key", "Slack", "cmd+k"])

# 3. Type channel name and press Enter
run_command(["agent-native", "key", "Slack", "general", "return"])

# 4. Type message
run_command(["agent-native", "key", "Slack", "Hello from agent-native!"])

# 5. Send (Enter)
run_command(["agent-native", "key", "Slack", "return"])

Fill a web form in Safari

# agent-native can interact with web content through the AX tree

# 1. Open Safari
run_command(["agent-native", "open", "Safari"])

# 2. Snapshot with increased depth to reach web content
snapshot = json.loads(
    run_command(["agent-native", "snapshot", "Safari", "-i", "--json", "-d", "10"])
)

# 3. Find address bar
address_bar = next(
    el for el in snapshot
    if el["role"] == "AXTextField" and "address" in el.get("label", "").lower()
)

# 4. Navigate to URL
run_command(["agent-native", "fill", address_bar["ref"], "https://example.com"])
run_command(["agent-native", "key", "Safari", "return"])

# 5. Wait for page load
time.sleep(2)

# 6. Re-snapshot to get web form elements
snapshot = json.loads(
    run_command(["agent-native", "snapshot", "Safari", "-i", "--json", "-d", "12"])
)

# 7. Find and fill form fields
email_field = next(
    el for el in snapshot
    if el["role"] == "AXTextField" and "email" in el.get("label", "").lower()
)
run_command(["agent-native", "fill", email_field["ref"], "[email protected]"])

Always re-snapshot after UI navigation or state changes. Refs from old snapshots may not resolve correctly after the UI structure changes.

Multi-step automation

AI agents excel at breaking down complex tasks into steps:

def automate_system_settings_change(setting_path: list[str], value: str):
    """
    Navigate System Settings hierarchy and change a value.
    
    Args:
        setting_path: List of navigation steps, e.g. ["Wi-Fi", "Advanced"]
        value: The value to set
    """
    # Open System Settings
    run_command(["agent-native", "open", "System Settings"])
    time.sleep(1)
    
    # Navigate through each level
    for step in setting_path:
        snapshot = snapshot_app("System Settings", interactive_only=True)
        
        # Find button or link with matching title
        target = next(
            (el for el in snapshot if step.lower() in el.get("title", "").lower()),
            None
        )
        
        if not target:
            raise ValueError(f"Could not find '{step}' in current view")
        
        click_element(target["ref"])
        time.sleep(1)
    
    # Now we're at the target pane, find the setting and change it
    snapshot = snapshot_app("System Settings", interactive_only=True)
    # ... interact with the setting

The LLM can reason about the navigation hierarchy and dynamically adjust the path if the UI doesn’t match expectations.

Handling uncertainty

AI agents should handle cases where the AX tree doesn’t provide enough information:

def interact_with_app(app_name: str, task: str):
    """Try AX tree first, fall back to keyboard/screenshot."""
    
    # 1. Try snapshot
    snapshot = snapshot_app(app_name, interactive_only=True)
    
    # 2. Check if we got useful elements
    if len(snapshot) < 3:  # Very sparse tree
        print(f"Sparse AX tree for {app_name}, using keyboard shortcuts")
        # Fall back to keyboard commands
        # Use known shortcuts or ask user for guidance
        return use_keyboard_shortcuts(app_name, task)
    
    # 3. If needed, take a screenshot for visual context
    screenshot_path = run_command(
        ["agent-native", "screenshot", app_name, "--json"]
    )
    # Send screenshot to vision model for additional context
    # ...

Best practices

For detailed guidance on using agent-native effectively with AI agents, see:

JSON output mode

Learn about structured output formats for each command

Best practices

Essential patterns for reliable AI automation

OpenCode skill

Install the pre-built OpenCode skill for instant integration

Get Started

Core Concepts

Commands

AI Integration

Guides

Reference

AI integration overview

Why agent-native is built for AI

Integration patterns

1. Tool calling / function calling

2. Direct shell commands

3. MCP (Model Context Protocol) server

Example workflows

Toggle Wi-Fi in System Settings

Search and message in Slack

Fill a web form in Safari

Multi-step automation

Handling uncertainty

Best practices

JSON output mode

Best practices

OpenCode skill

Build docs developers (and LLMs) love

Get Started

Core Concepts

Commands

AI Integration

Guides

Reference

​Why agent-native is built for AI

​Integration patterns

​1. Tool calling / function calling

​2. Direct shell commands

​3. MCP (Model Context Protocol) server

​Example workflows

​Toggle Wi-Fi in System Settings

​Search and message in Slack

​Fill a web form in Safari

​Multi-step automation

​Handling uncertainty

​Best practices

JSON output mode

Best practices

OpenCode skill

Build docs developers (and LLMs) love

Why agent-native is built for AI

Integration patterns

1. Tool calling / function calling

2. Direct shell commands

3. MCP (Model Context Protocol) server

Example workflows

Toggle Wi-Fi in System Settings

Search and message in Slack

Fill a web form in Safari

Multi-step automation

Handling uncertainty

Best practices