Skip to main content

Overview

Sets keyboard focus on an element, making it the active element for keyboard input. This brings the element into focus without triggering click actions, useful for preparing elements for keyboard navigation or input.

Syntax

agent-desktop focus <ref>

Parameters

ref
string
required
Element reference from snapshot (@e1, @e2, etc.)

Response

{
  "version": "1.0",
  "ok": true,
  "command": "focus",
  "data": {
    "action": "set_focus",
    "ref_id": "@e4",
    "post_state": {
      "role": "textfield",
      "states": ["enabled", "focused"]
    }
  }
}

Response Fields

action
string
The action performed (set_focus)
ref_id
string
The element reference that received focus
post_state
object
Element state after focusing
  • role (string): Element role
  • states (string[]): Should include “focused”

AX-First Strategy

The focus command uses:
  1. Set kAXFocusedAttribute to true via accessibility
  2. Try tab navigation to element
  3. Click element center if AX focus fails

Usage Examples

Focus Text Field Before Typing

agent-desktop snapshot --app TextEdit -i
agent-desktop focus @e3
agent-desktop press cmd+a  # Select all in focused field
agent-desktop type @e3 "new content"

Focus Window Element

agent-desktop snapshot --app Finder -i
agent-desktop focus @e5  # Focus specific list item
agent-desktop press space  # Select item

Keyboard Navigation Setup

# Focus first field in form
agent-desktop focus @e2

# Tab to next field
agent-desktop press tab

# Tab to next field
agent-desktop press tab

# Type into current (focused) field
agent-desktop type @e4 "value"

Focus for Accessibility

# Focus element to trigger screen reader announcement
agent-desktop focus @e10
agent-desktop wait 500  # Allow screen reader to announce

Focus vs Click

Aspectfocusclick
Sets focusYesYes (as side effect)
Triggers actionNoYes
Activates buttonNoYes
Opens menuNoMay
Use casePrepare for inputActivate element
Use focus when:
  • Preparing for keyboard input
  • Setting up navigation context
  • Triggering focus events without click
  • Controlling focus ring visibility
Use click when:
  • Activating buttons, links, menus
  • Triggering element actions
  • Simulating user interaction

Common Use Cases

  • Form Navigation: Focus fields before typing
  • Keyboard Control: Set focus before sending keys
  • Accessibility: Trigger focus-based announcements
  • List Navigation: Focus items before using arrow keys
  • Modal Dialogs: Focus first input field

Focus Scope

Element TypeFocus Behavior
Text fieldsCursor appears, ready for input
ButtonsFocus ring appears, pressable with space/return
List itemsSelection highlight, navigable with arrows
LinksFocus ring, activatable with return
TabsFocus, switchable with arrows

Error Cases

Error CodeCauseRecovery
ELEMENT_NOT_FOUNDRef doesn’t exist in current refmapRun snapshot to refresh
STALE_REFElement no longer matches saved refRun snapshot and use new ref
ACTION_FAILEDElement cannot receive focusElement may not be focusable
ACTION_NOT_SUPPORTEDElement type doesn’t support focusTry clicking instead

Verifying Focus

To confirm an element has focus:
# Set focus
agent-desktop focus @e5

# Verify focus state
agent-desktop is @e5 focused

# Or check in snapshot
agent-desktop snapshot -i | jq '.data.tree'

Notes

  • Focus is automatically set by type command
  • Some elements cannot receive focus (static text, containers)
  • Focus may move on app state changes
  • Only one element can have focus at a time per application
  • Focus ring visibility depends on system accessibility settings
  • Use is @e5 focused to verify focus state
  • Focus may trigger JavaScript focus events in web views

Build docs developers (and LLMs) love