Skip to main content
Safari exposes web content through the Accessibility tree, allowing you to interact with web pages using the same commands as native apps.

Overview

Safari is unique among browsers because it exposes web content as Accessibility nodes. This means you can use agent-native to:
  • Navigate to URLs
  • Click links and buttons
  • Fill forms
  • Read page content
  • Interact with web applications
AX tree peers into browsers — Safari and Chrome expose web content as AX nodes, making web automation possible without browser-specific APIs.See SKILL.md:166 for details.

Core workflow

1

Open Safari

agent-native open Safari
2

Navigate to URL

Safari’s address bar is an AXTextField. Find it and fill it:
agent-native snapshot Safari -i | grep -i "address\|search"
AXTextField "Address and Search" [ref=n5]
agent-native fill @n5 "https://github.com"
agent-native key Safari return
sleep 2  # Wait for page load
3

Re-snapshot to see page content

agent-native snapshot Safari -i
Web elements appear with their semantic roles: AXButton, AXLink, AXTextField, etc.
4

Interact with page elements

# Click a link
agent-native click @n42

# Fill a form field
agent-native fill @n15 "search query"

# Submit form
agent-native click @n20

Safari’s Accessibility tree

Safari represents web pages with semantic HTML roles mapped to AX roles:
HTML ElementAX RoleNotes
<button>AXButtonClickable with click
<a>AXLinkClickable with click
<input type="text">AXTextFieldFillable with fill
<input type="checkbox">AXCheckBoxUse check/uncheck
<input type="radio">AXRadioButtonUse click
<select>AXPopUpButtonUse select
<h1> - <h6>AXHeadingHas level attribute
<div>, <span>AXGroupStructural containers
<p>AXStaticTextReadable with get text
See AXEngine.swift:196-234 for how nodes are built from AX attributes.

Entering URLs

# Method 1: Fill address bar by ref
agent-native snapshot Safari -i > /tmp/snap.txt
ADDRESS_REF=$(grep -i "Address" /tmp/snap.txt | \
  grep "AXTextField" | grep -o 'ref=n[0-9]*' | sed 's/ref=//' | head -1)

agent-native fill "@$ADDRESS_REF" "https://example.com"
agent-native key Safari return

# Method 2: Use keyboard shortcut
agent-native key Safari cmd+l              # Focus address bar
agent-native key Safari "https://example.com" return

# Method 3: Filter-based (no snapshot needed)
agent-native fill Safari "https://example.com" --label "Address"
agent-native key Safari return
Keyboard shortcuts (cmd+l) are fastest for repetitive tasks.
# Use keyboard shortcuts
agent-native key Safari cmd+[
# Navigate forward
agent-native key Safari cmd+]

# Or click toolbar buttons
agent-native snapshot Safari -i | grep -i "back\|forward"
agent-native click @n3  # Back button

Reloading pages

agent-native key Safari cmd+r

Opening new tabs

agent-native key Safari cmd+t

Form filling

Text inputs

# Find text fields
agent-native snapshot Safari -i | grep "AXTextField"

# Fill by ref
agent-native fill @n10 "[email protected]"

# Fill by label
agent-native fill Safari "[email protected]" --label "Email"

Checkboxes

# Find checkboxes
agent-native snapshot Safari -i | grep "AXCheckBox"

# Check
agent-native check @n12

# Uncheck
agent-native uncheck @n12

# Check by title
agent-native check Safari --title "I agree to terms"
# Find popup buttons (select elements)
agent-native snapshot Safari -i | grep "AXPopUpButton"

# Select option
agent-native select @n15 "Option A"

# Select by label
agent-native select Safari "United States" --label "Country"
See InteractionCommands.swift for the select implementation.

Radio buttons

# Find radio buttons
agent-native snapshot Safari -i | grep "AXRadioButton"

# Select by clicking
agent-native click @n18

# Click by title
agent-native click Safari --title "Credit Card"

Submit buttons

# Find submit buttons
agent-native snapshot Safari -i | grep -i "submit\|sign in\|login"

# Click
agent-native click @n20

# Or use Enter key
agent-native key Safari return

Example: GitHub login

Complete workflow for logging into GitHub:
#!/usr/bin/env bash
set -euo pipefail

# Open Safari
agent-native open Safari
sleep 1

# Navigate to GitHub login
agent-native key Safari cmd+l
agent-native key Safari "https://github.com/login" return
sleep 3  # Wait for page load

# Snapshot the login page
agent-native snapshot Safari -i > /tmp/github-login.txt

# Find username field
USERNAME_REF=$(grep -i "username\|email" /tmp/github-login.txt | \
  grep "AXTextField" | grep -o 'ref=n[0-9]*' | sed 's/ref=//' | head -1)

if [[ -n "$USERNAME_REF" ]]; then
  agent-native fill "@$USERNAME_REF" "your-username"
  echo "Filled username"
else
  echo "Username field not found"
  exit 1
fi

# Find password field
PASSWORD_REF=$(grep -i "password" /tmp/github-login.txt | \
  grep "AXTextField" | grep -o 'ref=n[0-9]*' | sed 's/ref=//' | head -1)

if [[ -n "$PASSWORD_REF" ]]; then
  agent-native fill "@$PASSWORD_REF" "your-password"
  echo "Filled password"
else
  echo "Password field not found"
  exit 1
fi

# Find and click Sign In button
SIGNIN_REF=$(grep -i "sign in" /tmp/github-login.txt | \
  grep "AXButton" | grep -o 'ref=n[0-9]*' | sed 's/ref=//' | head -1)

if [[ -n "$SIGNIN_REF" ]]; then
  agent-native click "@$SIGNIN_REF"
  echo "Clicked Sign In"
else
  echo "Sign In button not found"
  exit 1
fi

sleep 3  # Wait for login

# Verify login by checking page title
title=$(agent-native get title Safari)
if [[ "$title" == *"GitHub"* ]] && [[ "$title" != *"Login"* ]]; then
  echo "Successfully logged in!"
else
  echo "Login may have failed. Current title: $title"
fi
#!/usr/bin/env bash
set -euo pipefail

# Navigate to Google
agent-native open Safari
agent-native key Safari cmd+l
agent-native key Safari "https://google.com" return
sleep 2

# Find search box
agent-native snapshot Safari -i > /tmp/google.txt
SEARCH_REF=$(grep -i "search" /tmp/google.txt | \
  grep "AXTextField" | grep -o 'ref=n[0-9]*' | sed 's/ref=//' | head -1)

# Perform search
agent-native fill "@$SEARCH_REF" "agent-native macOS"
agent-native key Safari return
sleep 2

# Snapshot search results
agent-native snapshot Safari -i > /tmp/results.txt

# Find and click first result link
FIRST_LINK=$(grep "AXLink" /tmp/results.txt | \
  grep -o 'ref=n[0-9]*' | sed 's/ref=//' | head -1)

if [[ -n "$FIRST_LINK" ]]; then
  agent-native click "@$FIRST_LINK"
  echo "Clicked first result"
fi

Reading page content

Get text from elements

# Snapshot page
agent-native snapshot Safari -i > /tmp/page.txt

# Read specific element
agent-native get text @n25

# Find headings
grep "AXHeading" /tmp/page.txt

# Extract all text from headings
for ref in $(grep "AXHeading" /tmp/page.txt | grep -o '@n[0-9]*'); do
  agent-native get text "$ref"
done

Get page title

agent-native get title Safari
GitHub: Let's build from here · GitHub

Check element states

# Check if button is enabled
agent-native is enabled @n10

# Check if checkbox is checked
agent-native get value @n15  # Returns "1" if checked, "0" if unchecked

Working with dynamic content

Waiting for elements

Use wait to wait for elements to appear:
# Wait for a button to appear
agent-native wait Safari --title "Submit" --timeout 5

# Wait for any button
agent-native wait Safari --role AXButton --timeout 10
See WaitCommand.swift for implementation.

Handling page loads

# Navigate
agent-native key Safari cmd+l
agent-native key Safari "https://example.com" return

# Wait for page title to change
old_title=$(agent-native get title Safari)
sleep 2
new_title=$(agent-native get title Safari)

if [[ "$old_title" != "$new_title" ]]; then
  echo "Page loaded: $new_title"
fi

# Or use fixed delay
sleep 3  # Wait 3 seconds for page load

Re-snapshotting after interactions

# Click a button that loads new content
agent-native click @n10
sleep 1

# Re-snapshot to see new content
agent-native snapshot Safari -i > /tmp/updated.txt

Common Safari keyboard shortcuts

ShortcutAction
cmd+lFocus address bar
cmd+tNew tab
cmd+wClose tab
cmd+rReload page
cmd+[Back
cmd+]Forward
cmd+fFind in page
cmd+plusZoom in
cmd+minusZoom out
cmd+0Reset zoom

Best practices

Wait for page loads

Always add delays after navigation:
agent-native key Safari return
sleep 3  # Wait for page load

Use keyboard shortcuts

Shortcuts are more reliable than clicking toolbar buttons:
agent-native key Safari cmd+l  # Focus address bar
agent-native key Safari cmd+r  # Reload

Check page titles

Verify navigation by checking titles:
title=$(agent-native get title Safari)
echo "Current page: $title"

Handle dynamic content

Use wait for elements that load asynchronously:
agent-native wait Safari --title "Submit" --timeout 5

Troubleshooting

Possible causes:
  • Page not fully loaded
  • Content is in an iframe
  • Elements are dynamically rendered
Solutions:
  1. Wait longer for page load:
    sleep 5
    agent-native snapshot Safari -i
    
  2. Check full tree without -i flag:
    agent-native snapshot Safari > full-tree.txt
    
  3. Try increasing depth:
    agent-native snapshot Safari -i -d 15
    
Solutions:
  1. Use keyboard shortcut instead:
    agent-native key Safari cmd+l
    agent-native key Safari "https://example.com" return
    
  2. Search for “Address” or “Search”:
    agent-native find Safari --title "Address"
    
Possible causes:
  • Submit button requires specific event
  • Form validation failing
Solutions:
  1. Try pressing Enter instead of clicking:
    agent-native key Safari return
    
  2. Check if validation errors appear:
    agent-native snapshot Safari -i | grep -i "error\|invalid"
    
  3. Verify all required fields are filled
Problem: Multiple elements with similar titles.Solutions:
  1. Use more specific filters:
    agent-native click Safari --title "Submit" --role AXButton
    
  2. Use index to select specific match:
    # Uses filter-based resolution with index
    # (Not exposed in CLI but available in ElementResolver.swift:13)
    
  3. Inspect elements to verify:
    agent-native find Safari --title "Submit"
    agent-native inspect @n10
    
Problem: Snapshot shows old content after interaction.Cause: Need to wait for page update.Solutions:
  1. Add delay before re-snapshotting:
    agent-native click @n10
    sleep 2
    agent-native snapshot Safari -i
    
  2. Use wait for specific element:
    agent-native wait Safari --title "Success" --timeout 5
    

Limitations

Safari AX limitations:
  • Some JavaScript-heavy SPAs may have incomplete AX trees
  • Canvas-based content (games, visual editors) is not accessible
  • Shadow DOM elements may be hidden
  • Some modern frameworks render minimal semantic HTML
For complex web automation, consider:
  • Using simpler, more accessible websites
  • Combining with screenshots for visual verification
  • Using keyboard shortcuts when AX tree is sparse
  • Testing with Safari’s Accessibility Inspector first

Next steps

Electron apps

Automate Slack, Discord, and VS Code

System Settings

Configure macOS system settings

Troubleshooting

Fix common issues and errors

API reference

Explore all commands

Build docs developers (and LLMs) love