Safari exposes web content through the Accessibility tree, allowing you to interact with web pages using the same commands as native apps.
Overview
Safari is unique among browsers because it exposes web content as Accessibility nodes. This means you can use agent-native to:
Navigate to URLs
Click links and buttons
Fill forms
Read page content
Interact with web applications
AX tree peers into browsers — Safari and Chrome expose web content as AX nodes, making web automation possible without browser-specific APIs.See SKILL.md:166 for details.
Core workflow
Navigate to URL
Safari’s address bar is an AXTextField. Find it and fill it: agent-native snapshot Safari -i | grep -i "address\|search"
AXTextField "Address and Search" [ref=n5]
agent-native fill @n5 "https://github.com"
agent-native key Safari return
sleep 2 # Wait for page load
Re-snapshot to see page content
agent-native snapshot Safari -i
Web elements appear with their semantic roles: AXButton, AXLink, AXTextField, etc.
Interact with page elements
# Click a link
agent-native click @n42
# Fill a form field
agent-native fill @n15 "search query"
# Submit form
agent-native click @n20
Safari’s Accessibility tree
Safari represents web pages with semantic HTML roles mapped to AX roles:
HTML Element AX Role Notes <button>AXButtonClickable with click <a>AXLinkClickable with click <input type="text">AXTextFieldFillable with fill <input type="checkbox">AXCheckBoxUse check/uncheck <input type="radio">AXRadioButtonUse click <select>AXPopUpButtonUse select <h1> - <h6>AXHeadingHas level attribute <div>, <span>AXGroupStructural containers <p>AXStaticTextReadable with get text
See AXEngine.swift:196-234 for how nodes are built from AX attributes.
Navigation patterns
Entering URLs
# Method 1: Fill address bar by ref
agent-native snapshot Safari -i > /tmp/snap.txt
ADDRESS_REF = $( grep -i "Address" /tmp/snap.txt | \
grep "AXTextField" | grep -o 'ref=n[0-9]*' | sed 's/ref=//' | head -1 )
agent-native fill "@ $ADDRESS_REF " "https://example.com"
agent-native key Safari return
# Method 2: Use keyboard shortcut
agent-native key Safari cmd+l # Focus address bar
agent-native key Safari "https://example.com" return
# Method 3: Filter-based (no snapshot needed)
agent-native fill Safari "https://example.com" --label "Address"
agent-native key Safari return
Keyboard shortcuts (cmd+l) are fastest for repetitive tasks.
Navigating back/forward
# Use keyboard shortcuts
agent-native key Safari cmd+[
# Navigate forward
agent-native key Safari cmd+]
# Or click toolbar buttons
agent-native snapshot Safari -i | grep -i "back\|forward"
agent-native click @n3 # Back button
Reloading pages
agent-native key Safari cmd+r
Opening new tabs
agent-native key Safari cmd+t
Text inputs
# Find text fields
agent-native snapshot Safari -i | grep "AXTextField"
# Fill by ref
agent-native fill @n10 "[email protected] "
# Fill by label
agent-native fill Safari "[email protected] " --label "Email"
Checkboxes
# Find checkboxes
agent-native snapshot Safari -i | grep "AXCheckBox"
# Check
agent-native check @n12
# Uncheck
agent-native uncheck @n12
# Check by title
agent-native check Safari --title "I agree to terms"
Dropdowns/Select elements
# Find popup buttons (select elements)
agent-native snapshot Safari -i | grep "AXPopUpButton"
# Select option
agent-native select @n15 "Option A"
# Select by label
agent-native select Safari "United States" --label "Country"
See InteractionCommands.swift for the select implementation.
# Find radio buttons
agent-native snapshot Safari -i | grep "AXRadioButton"
# Select by clicking
agent-native click @n18
# Click by title
agent-native click Safari --title "Credit Card"
# Find submit buttons
agent-native snapshot Safari -i | grep -i "submit\|sign in\|login"
# Click
agent-native click @n20
# Or use Enter key
agent-native key Safari return
Example: GitHub login
Complete workflow for logging into GitHub:
#!/usr/bin/env bash
set -euo pipefail
# Open Safari
agent-native open Safari
sleep 1
# Navigate to GitHub login
agent-native key Safari cmd+l
agent-native key Safari "https://github.com/login" return
sleep 3 # Wait for page load
# Snapshot the login page
agent-native snapshot Safari -i > /tmp/github-login.txt
# Find username field
USERNAME_REF = $( grep -i "username\|email" /tmp/github-login.txt | \
grep "AXTextField" | grep -o 'ref=n[0-9]*' | sed 's/ref=//' | head -1 )
if [[ -n " $USERNAME_REF " ]]; then
agent-native fill "@ $USERNAME_REF " "your-username"
echo "Filled username"
else
echo "Username field not found"
exit 1
fi
# Find password field
PASSWORD_REF = $( grep -i "password" /tmp/github-login.txt | \
grep "AXTextField" | grep -o 'ref=n[0-9]*' | sed 's/ref=//' | head -1 )
if [[ -n " $PASSWORD_REF " ]]; then
agent-native fill "@ $PASSWORD_REF " "your-password"
echo "Filled password"
else
echo "Password field not found"
exit 1
fi
# Find and click Sign In button
SIGNIN_REF = $( grep -i "sign in" /tmp/github-login.txt | \
grep "AXButton" | grep -o 'ref=n[0-9]*' | sed 's/ref=//' | head -1 )
if [[ -n " $SIGNIN_REF " ]]; then
agent-native click "@ $SIGNIN_REF "
echo "Clicked Sign In"
else
echo "Sign In button not found"
exit 1
fi
sleep 3 # Wait for login
# Verify login by checking page title
title = $( agent-native get title Safari )
if [[ " $title " == * "GitHub" * ]] && [[ " $title " != * "Login" * ]]; then
echo "Successfully logged in!"
else
echo "Login may have failed. Current title: $title "
fi
Example: Searching and clicking links
#!/usr/bin/env bash
set -euo pipefail
# Navigate to Google
agent-native open Safari
agent-native key Safari cmd+l
agent-native key Safari "https://google.com" return
sleep 2
# Find search box
agent-native snapshot Safari -i > /tmp/google.txt
SEARCH_REF = $( grep -i "search" /tmp/google.txt | \
grep "AXTextField" | grep -o 'ref=n[0-9]*' | sed 's/ref=//' | head -1 )
# Perform search
agent-native fill "@ $SEARCH_REF " "agent-native macOS"
agent-native key Safari return
sleep 2
# Snapshot search results
agent-native snapshot Safari -i > /tmp/results.txt
# Find and click first result link
FIRST_LINK = $( grep "AXLink" /tmp/results.txt | \
grep -o 'ref=n[0-9]*' | sed 's/ref=//' | head -1 )
if [[ -n " $FIRST_LINK " ]]; then
agent-native click "@ $FIRST_LINK "
echo "Clicked first result"
fi
Reading page content
Get text from elements
# Snapshot page
agent-native snapshot Safari -i > /tmp/page.txt
# Read specific element
agent-native get text @n25
# Find headings
grep "AXHeading" /tmp/page.txt
# Extract all text from headings
for ref in $( grep "AXHeading" /tmp/page.txt | grep -o '@n[0-9]*' ); do
agent-native get text " $ref "
done
Get page title
agent-native get title Safari
GitHub: Let's build from here · GitHub
Check element states
# Check if button is enabled
agent-native is enabled @n10
# Check if checkbox is checked
agent-native get value @n15 # Returns "1" if checked, "0" if unchecked
Working with dynamic content
Waiting for elements
Use wait to wait for elements to appear:
# Wait for a button to appear
agent-native wait Safari --title "Submit" --timeout 5
# Wait for any button
agent-native wait Safari --role AXButton --timeout 10
See WaitCommand.swift for implementation.
Handling page loads
# Navigate
agent-native key Safari cmd+l
agent-native key Safari "https://example.com" return
# Wait for page title to change
old_title = $( agent-native get title Safari )
sleep 2
new_title = $( agent-native get title Safari )
if [[ " $old_title " != " $new_title " ]]; then
echo "Page loaded: $new_title "
fi
# Or use fixed delay
sleep 3 # Wait 3 seconds for page load
Re-snapshotting after interactions
# Click a button that loads new content
agent-native click @n10
sleep 1
# Re-snapshot to see new content
agent-native snapshot Safari -i > /tmp/updated.txt
Common Safari keyboard shortcuts
Shortcut Action cmd+lFocus address bar cmd+tNew tab cmd+wClose tab cmd+rReload page cmd+[Back cmd+]Forward cmd+fFind in page cmd+plusZoom in cmd+minusZoom out cmd+0Reset zoom
Best practices
Wait for page loads Always add delays after navigation: agent-native key Safari return
sleep 3 # Wait for page load
Use keyboard shortcuts Shortcuts are more reliable than clicking toolbar buttons: agent-native key Safari cmd+l # Focus address bar
agent-native key Safari cmd+r # Reload
Check page titles Verify navigation by checking titles: title = $( agent-native get title Safari )
echo "Current page: $title "
Handle dynamic content Use wait for elements that load asynchronously: agent-native wait Safari --title "Submit" --timeout 5
Troubleshooting
Web elements not appearing in snapshot
Possible causes:
Page not fully loaded
Content is in an iframe
Elements are dynamically rendered
Solutions:
Wait longer for page load:
sleep 5
agent-native snapshot Safari -i
Check full tree without -i flag:
agent-native snapshot Safari > full-tree.txt
Try increasing depth:
agent-native snapshot Safari -i -d 15
Solutions:
Use keyboard shortcut instead:
agent-native key Safari cmd+l
agent-native key Safari "https://example.com" return
Search for “Address” or “Search”:
agent-native find Safari --title "Address"
Form submission not working
Problem: Multiple elements with similar titles.Solutions:
Use more specific filters:
agent-native click Safari --title "Submit" --role AXButton
Use index to select specific match:
# Uses filter-based resolution with index
# (Not exposed in CLI but available in ElementResolver.swift:13)
Inspect elements to verify:
agent-native find Safari --title "Submit"
agent-native inspect @n10
Page content not updating
Problem: Snapshot shows old content after interaction.Cause: Need to wait for page update.Solutions:
Add delay before re-snapshotting:
agent-native click @n10
sleep 2
agent-native snapshot Safari -i
Use wait for specific element:
agent-native wait Safari --title "Success" --timeout 5
Limitations
Safari AX limitations:
Some JavaScript-heavy SPAs may have incomplete AX trees
Canvas-based content (games, visual editors) is not accessible
Shadow DOM elements may be hidden
Some modern frameworks render minimal semantic HTML
For complex web automation, consider:
Using simpler, more accessible websites
Combining with screenshots for visual verification
Using keyboard shortcuts when AX tree is sparse
Testing with Safari’s Accessibility Inspector first
Next steps
Electron apps Automate Slack, Discord, and VS Code
System Settings Configure macOS system settings
Troubleshooting Fix common issues and errors
API reference Explore all commands