Skip to main content
The macOS Accessibility (AX) tree is a hierarchical representation of every UI element in an application. It’s how agent-native “sees” and interacts with apps—similar to how a DOM represents a web page, the AX tree represents native macOS interfaces.

What is the Accessibility tree?

The Accessibility tree is provided by macOS through the Accessibility APIs (Carbon’s AXUIElement). Every running application exposes its UI hierarchy through this system, which was originally designed for assistive technologies like VoiceOver. Each node in the tree represents a UI element—buttons, text fields, windows, menus, etc. The tree structure mirrors the visual hierarchy: windows contain groups, groups contain buttons, and so on.
The Accessibility tree is built dynamically by querying macOS APIs. agent-native doesn’t modify the tree—it’s a read-only view of the application’s current UI state.

AX roles

Every element has a role that describes what kind of UI component it is:
  • AXWindow - Application windows
  • AXButton - Clickable buttons
  • AXTextField - Text input fields
  • AXStaticText - Read-only text labels
  • AXGroup - Container elements
  • AXMenuBar - Menu bars
  • AXMenuItem - Individual menu items
  • AXCheckBox, AXRadioButton, AXSlider, etc.
See the complete list in SnapshotCommand.swift:26-33.

AX attributes

Beyond role, elements have attributes that provide additional information:
// From AXNode.swift:4-19
struct AXNode {
    let role: String          // AXButton, AXTextField, etc.
    let subrole: String?      // More specific role variants
    let title: String?        // AXTitle - button text, window title
    let value: String?        // AXValue - text field contents, slider values
    let label: String?        // AXDescription - accessibility label
    let identifier: String?   // AXIdentifier - programmatic ID
    let enabled: Bool         // AXEnabled - can the user interact?
    let focused: Bool         // AXFocused - has keyboard focus?
    let x, y: Double?         // AXPosition - screen coordinates
    let width, height: Double? // AXSize - element dimensions
    let actions: [String]     // Available actions (AXPress, etc.)
}
Not all attributes exist on every element. A button might have title but no value, while a text field has value but might not have title.

AX actions

Elements expose actions that can be performed on them:
  • AXPress - Click a button
  • AXConfirm - Activate/confirm
  • AXCancel - Cancel/dismiss
  • AXRaise - Bring window to front
  • AXShowMenu - Show context menu
From AXEngine.swift:188-193, agent-native queries available actions:
static func actions(_ element: AXUIElement) -> [String] {
    var names: CFArray?
    let result = AXUIElementCopyActionNames(element, &names)
    guard result == .success, let arr = names as? [String] else { return [] }
    return arr
}

Tree structure and traversal

The tree is built by recursively walking from the application root element down through children. From AXEngine.swift:236-275:
static func walkTree(
    _ element: AXUIElement,
    path: String = "",
    depth: Int = 0,
    maxDepth: Int = 5
) -> [(node: AXNode, depth: Int)] {
    // Build path like /AXWindow[@title="Settings"]/AXButton[@title="OK"]
    let fullPath = path.isEmpty ? "/\(segment)" : "\(path)/\(segment)"
    let node = nodeFrom(element, path: fullPath)
    var results: [(AXNode, Int)] = [(node, depth)]
    
    // Recurse through children
    guard depth < maxDepth else { return results }
    for child in children(element) {
        results += walkTree(child, path: fullPath, depth: depth + 1, maxDepth: maxDepth)
    }
    return results
}
Deep trees can be expensive to traverse. Use --depth to limit how far agent-native walks. The default is 5-8 levels depending on the command.

How agent-native uses the AX tree

Agent-native accesses the tree through these operations:
1

Create application element

Connect to a running app by PID:
// AXEngine.swift:138-140
static func appElement(pid: Int32) -> AXUIElement {
    AXUIElementCreateApplication(pid)
}
2

Read attributes

Query element properties:
// AXEngine.swift:142-158
let title = stringAttr(element, kAXTitleAttribute)
let enabled = boolAttr(element, kAXEnabledAttribute)
let position = position(element)  // CGPoint
3

Find elements

Search the tree by role, title, label, or identifier (see AXEngine.swift:279-356).
4

Perform actions

Execute actions like AXPress:
// AXEngine.swift:361-363
static func performAction(_ element: AXUIElement, action: String) -> Bool {
    AXUIElementPerformAction(element, action as CFString) == .success
}

Permissions required

Accessing the Accessibility tree requires explicit user permission. agent-native checks and requests access:
// AXEngine.swift:379-386
static func checkAccess() -> Bool {
    AXIsProcessTrusted()
}

static func requestAccess() {
    let options = [kAXTrustedCheckOptionPrompt.takeRetainedValue(): true] as CFDictionary
    AXIsProcessTrustedWithOptions(options)
}
You must grant Accessibility permissions in System Settings > Privacy & Security > Accessibility. Without this, all commands will fail with accessDenied.

See also

Refs and snapshots

How agent-native assigns refs to tree elements

Workflow

The snapshot → interact → re-snapshot pattern

Build docs developers (and LLMs) love