What Are Refs?
Refs (references) are deterministic identifiers assigned to interactive elements during a snapshot command. They follow the format @e{n} where n is a sequential integer.
{
"ref_id" : "@e1" ,
"role" : "button" ,
"name" : "Save"
}
Refs serve as stable pointers within a snapshot session, allowing agents to target specific UI elements without fragile text-based selectors.
Allocation Rules
Sequential Assignment
Refs are allocated in depth-first document order :
agent-desktop snapshot --app TextEdit -i
{
"tree" : {
"role" : "window" ,
"children" : [
{ "ref_id" : "@e1" , "role" : "button" , "name" : "Close" },
{ "ref_id" : "@e2" , "role" : "button" , "name" : "Minimize" },
{
"role" : "group" ,
"children" : [
{ "ref_id" : "@e3" , "role" : "button" , "name" : "Bold" },
{ "ref_id" : "@e4" , "role" : "button" , "name" : "Italic" }
]
},
{ "ref_id" : "@e5" , "role" : "textfield" , "value" : "" }
]
}
}
Traversal order:
Window (no ref, structural element)
Close button → @e1
Minimize button → @e2
Group (no ref, container)
Bold button → @e3
Italic button → @e4
Text field → @e5
Ref IDs may change between snapshots if elements are added/removed/reordered, even if the target element is unchanged. Always use the most recent snapshot’s refs.
Interactive Elements Only
Refs are assigned only to interactive roles:
Receives Refs
No Refs (Structural)
button
textfield
checkbox
link
menuitem
tab
slider
combobox
treeitem
cell
radiobutton
incrementor
menubutton
switch
colorwell
dockitem
window
group
statictext / label
separator
toolbar
scrollbar
image
container
Why? Agents interact with buttons, fields, and controls — not labels or layout groups. Structural elements appear in the tree for context but can’t be acted upon.
Example: Mixed Tree
{
"role" : "group" ,
"name" : "Login Form" ,
"children" : [
{
"role" : "statictext" ,
"value" : "Username:"
},
{
"ref_id" : "@e1" ,
"role" : "textfield" ,
"name" : "Username" ,
"value" : ""
},
{
"role" : "statictext" ,
"value" : "Password:"
},
{
"ref_id" : "@e2" ,
"role" : "textfield" ,
"name" : "Password" ,
"value" : "" ,
"states" : [ "secure" ]
},
{
"ref_id" : "@e3" ,
"role" : "button" ,
"name" : "Log In"
}
]
}
Only 3 refs assigned (2 text fields + 1 button). Labels have no refs but provide context for agents to understand what each field represents.
Ref Lifecycle
Creation
Refs are created during snapshot execution:
// Source: crates/core/src/refs.rs
pub fn allocate ( & mut self , entry : RefEntry ) -> String {
self . counter += 1 ;
let ref_id = format! ( "@e{}" , self . counter);
self . inner . insert ( ref_id . clone (), entry );
ref_id
}
Each ref is stored with metadata in a RefEntry:
// Source: crates/core/src/refs.rs
pub struct RefEntry {
pub pid : i32 , // Process ID
pub role : String , // button, textfield, etc.
pub name : Option < String >, // Accessibility name
pub value : Option < String >, // Current value
pub states : Vec < String >, // enabled, focused, checked, etc.
pub bounds : Option < Rect >, // Pixel coordinates
pub bounds_hash : Option < u64 >, // Hash for fast re-identification
pub available_actions : Vec < String >, // Click, SetValue, etc.
pub source_app : Option < String >, // Application name
}
Persistence
The refmap is saved to disk after every snapshot:
~ /.agent-desktop/last_refmap.json
Permissions:
Directory: 0o700 (user read/write/execute only)
File: 0o600 (user read/write only)
This prevents other users from reading potentially sensitive UI state (window titles, field values).
The refmap file is replaced atomically on every snapshot. Old refs are not preserved.
Expiration
Refs are valid until the next snapshot . Any structural UI change invalidates the current refmap:
Window opened/closed
Dialog appeared/dismissed
Element added/removed/moved
Application restarted
New snapshot executed
Ref Resolution
Fast Path: Optimistic Re-identification
When an action command (e.g., click @e3) executes:
// Source: crates/core/src/commands/helpers.rs
pub fn resolve_ref (
ref_id : & str ,
adapter : & dyn PlatformAdapter ,
) -> Result <( RefEntry , NativeHandle ), AppError > {
let refmap = RefMap :: load () ? ;
let entry = refmap . get ( ref_id )
. ok_or_else ( || AppError :: stale_ref ( ref_id )) ? ;
let handle = adapter . resolve_element ( entry ) ? ;
Ok (( entry . clone (), handle ))
}
The platform adapter uses signature matching :
Match criteria:
- pid (process ID)
- role (button, textfield, etc.)
- name (accessibility label)
- bounds_hash (position/size fingerprint)
If any criterion fails → STALE_REF error.
Error Response
{
"version" : "1.0" ,
"ok" : false ,
"command" : "click" ,
"error" : {
"code" : "STALE_REF" ,
"message" : "@e7 not found in current RefMap" ,
"suggestion" : "Run 'snapshot' to refresh, then retry with updated ref"
}
}
Exit code: 1 (structured error)
Recovery Pattern
# Attempt action with potentially stale ref
agent-desktop click @e5
# → STALE_REF
# Re-snapshot to get fresh refs
agent-desktop snapshot --app Safari -i
# → New refmap with updated IDs
# Retry with new ref
agent-desktop click @e4 # Note: ID may have changed
# → Success
Do not blindly retry with the same ref ID. Always re-snapshot and use the new ref from the updated tree.
Ref Count and ref_count
The snapshot response includes a ref_count field:
{
"app" : "Finder" ,
"window" : { "id" : "w-4521" , "title" : "Documents" },
"ref_count" : 14 ,
"tree" : { ... }
}
This indicates how many interactive elements received refs. Use it to:
Validate snapshot completeness (0 refs = likely permission issue)
Estimate UI complexity
Debug ref allocation
// Source: crates/core/src/commands/snapshot.rs
let ref_count = result . refmap . len ();
Debugging Refs
View Stored Refmap
cat ~/.agent-desktop/last_refmap.json | jq
{
"inner" : {
"@e1" : {
"pid" : 1234 ,
"role" : "button" ,
"name" : "Save" ,
"value" : null ,
"states" : [ "enabled" ],
"bounds" : { "x" : 100.0 , "y" : 50.0 , "width" : 80.0 , "height" : 30.0 },
"bounds_hash" : 123456789 ,
"available_actions" : [ "Click" ],
"source_app" : "TextEdit"
}
},
"counter" : 1
}
Check Why Ref Failed
Compare the stored RefEntry with current UI state:
PID changed? → Application restarted
Bounds changed? → Window resized or element moved
Name changed? → UI text updated (e.g., “Save” → “Saved”)
Role changed? → Element replaced with different control type
Find Element Without Ref
Use find to locate elements by attributes:
agent-desktop find --role button --name "Submit" --app Safari
This searches the live accessibility tree without using refs.
Advanced: Bounds Hash
The bounds_hash field enables fast position-based matching:
// Source: crates/core/src/node.rs
pub fn bounds_hash ( & self ) -> u64 {
use rustc_hash :: FxHasher ;
use std :: hash :: { Hash , Hasher };
let mut h = FxHasher :: default ();
let x = ( self . x * 100.0 ) as i64 ; // 0.01 pixel precision
let y = ( self . y * 100.0 ) as i64 ;
let w = ( self . width * 100.0 ) as i64 ;
let hh = ( self . height * 100.0 ) as i64 ;
x . hash ( & mut h );
y . hash ( & mut h );
w . hash ( & mut h );
hh . hash ( & mut h );
h . finish ()
}
Why truncate to 0.01 pixel? Accessibility APIs sometimes report fractional pixels differently across calls (e.g., 100.0 vs 100.00001). Truncation makes matching robust.
Ref Constraints
Max Refmap Size
// Source: crates/core/src/refs.rs
const MAX_REFMAP_BYTES : u64 = 1_048_576 ; // 1 MB
If the refmap exceeds 1MB (extremely rare), snapshot returns an INTERNAL error.
Typical sizes:
Simple app (TextEdit): ~5KB
Medium app (Finder): ~50KB
Complex app (Xcode): ~200KB
Ref ID Limits
Ref IDs are u32 sequential counters:
Max refs per snapshot: 4,294,967,295
Practical limit: ~50,000 (even Xcode with full tree)
You will never hit the theoretical limit.
Best Practices
Always use latest refs
Discard old refs after each snapshot. Never cache refs across observations.
Handle STALE_REF gracefully
Treat it as a normal flow condition, not an error. Re-snapshot and continue.
Use --interactive-only
Reduces ref count to only actionable elements, improving performance and clarity.
Validate ref_count
If ref_count: 0, check permissions or verify the app has interactive elements.
Next Steps
JSON Output Explore the full response structure and data types
Error Handling Learn all error codes and recovery strategies