Skip to main content

Response Envelope

Every command returns a structured JSON envelope with consistent top-level fields:
{
  "version": "1.0",
  "ok": true,
  "command": "snapshot",
  "data": { ... }
}

Fields

version
string
required
Protocol version. Currently always "1.0". Future breaking changes will increment the major version.
ok
boolean
required
true if the command succeeded, false if it failed. Determines whether data or error is present.
command
string
required
The command name that was executed (e.g., "snapshot", "click", "type").
data
object
Success payload. Structure varies by command. Omitted when ok: false.
error
object
Error payload. Only present when ok: false. See Error Structure.

Success Responses

Snapshot

{
  "version": "1.0",
  "ok": true,
  "command": "snapshot",
  "data": {
    "app": "Finder",
    "window": {
      "id": "w-4521",
      "title": "Documents"
    },
    "ref_count": 14,
    "tree": {
      "role": "window",
      "name": "Documents",
      "children": [
        {
          "ref_id": "@e1",
          "role": "button",
          "name": "New Folder",
          "states": ["enabled"],
          "bounds": {"x": 20.0, "y": 80.0, "width": 100.0, "height": 32.0}
        },
        {
          "ref_id": "@e2",
          "role": "textfield",
          "name": "Search",
          "value": ""
        }
      ]
    }
  }
}
Source: crates/core/src/commands/snapshot.rs:54
data.app
string
Application name (e.g., "Finder", "Safari").
data.window
object
data.ref_count
integer
Number of interactive elements that received refs. Equals the number of nodes in tree with ref_id set.
data.tree
AccessibilityNode
Root node of the accessibility tree. See AccessibilityNode Type.

Action Commands

All interaction commands (click, type, set-value, toggle, etc.) return an ActionResult:
{
  "version": "1.0",
  "ok": true,
  "command": "click",
  "data": {
    "action": "click",
    "ref_id": "@e3",
    "post_state": {
      "role": "button",
      "states": ["enabled", "focused"],
      "value": null
    }
  }
}
Source: crates/core/src/action.rs:97
data.action
string
required
The action performed (e.g., "click", "type", "toggle").
data.ref_id
string
The ref that was acted upon. Omitted for non-element actions (e.g., press, clipboard-set).
data.post_state
ElementState
Element state after the action completed. Includes updated states, value, and role. Omitted if state unchanged.

List Commands

list-windows

{
  "version": "1.0",
  "ok": true,
  "command": "list-windows",
  "data": {
    "windows": [
      {
        "id": "w-4521",
        "title": "Documents",
        "app_name": "Finder",
        "pid": 1234,
        "bounds": {"x": 100.0, "y": 100.0, "width": 800.0, "height": 600.0},
        "is_focused": true
      }
    ]
  }
}
Source: crates/core/src/node.rs:66

list-apps

{
  "version": "1.0",
  "ok": true,
  "command": "list-apps",
  "data": {
    "apps": [
      {
        "name": "Finder",
        "pid": 1234,
        "bundle_id": "com.apple.finder"
      }
    ]
  }
}
Source: crates/core/src/node.rs:78

Clipboard Commands

{
  "version": "1.0",
  "ok": true,
  "command": "clipboard-get",
  "data": {
    "text": "Hello world"
  }
}

Screenshot Command

{
  "version": "1.0",
  "ok": true,
  "command": "screenshot",
  "data": {
    "format": "png",
    "base64": "iVBORw0KGgoAAAANSUhEUgAA...",
    "width": 1920,
    "height": 1080
  }
}
The base64 field contains the PNG image data encoded in Base64.

Error Structure

When ok: false, the response includes an error object:
{
  "version": "1.0",
  "ok": false,
  "command": "click",
  "error": {
    "code": "STALE_REF",
    "message": "@e7 not found in current RefMap",
    "suggestion": "Run 'snapshot' to refresh, then retry with updated ref"
  }
}
Source: crates/core/src/output.rs:34
error.code
string
required
Machine-readable error code. Always SCREAMING_SNAKE_CASE. See Error Codes.
error.message
string
required
Human-readable error description. May include context like ref IDs or element names.
error.suggestion
string
Recommended recovery action. Present for most errors. Examples:
  • "Run 'snapshot' to refresh, then retry with updated ref"
  • "Open System Settings > Privacy & Security > Accessibility and add your terminal"
error.platform_detail
string
OS-specific error details. Only present when the underlying platform API provides additional context (e.g., macOS AXError codes).

Core Data Types

AccessibilityNode Type

interface AccessibilityNode {
  ref_id?: string;           // Present only for interactive elements
  role: string;              // "button", "textfield", "window", etc.
  name?: string;             // Accessibility label
  value?: string;            // Current value (text fields, sliders, etc.)
  description?: string;      // AXDescription attribute
  hint?: string;             // AXHelp or accessibility hint
  states?: string[];         // ["enabled", "focused", "checked", etc.]
  bounds?: Rect;             // Present when --include-bounds flag used
  children?: AccessibilityNode[];  // Nested elements
}
Source: crates/core/src/node.rs:4

Common Roles

These receive refs:
  • button
  • textfield
  • checkbox
  • radiobutton
  • link
  • menuitem
  • tab
  • slider
  • combobox
  • switch

Common States

  • enabled / disabled
  • focused
  • checked / unchecked
  • expanded / collapsed
  • selected
  • pressed
  • secure (password fields)

Rect Type

{
  "x": 100.0,
  "y": 150.0,
  "width": 200.0,
  "height": 50.0
}
Source: crates/core/src/node.rs:33 Coordinates are in screen pixels, origin at top-left of the primary display.
Bounds are only included when using the --include-bounds flag in snapshot. Omitted by default to reduce payload size.

WindowInfo Type

interface WindowInfo {
  id: string;                // Window identifier (e.g., "w-4521")
  title: string;             // Window title
  app_name: string;          // Application name
  pid: number;               // Process ID
  bounds?: Rect;             // Window frame
  is_focused: boolean;       // True if window has keyboard focus
}
Source: crates/core/src/node.rs:66

AppInfo Type

interface AppInfo {
  name: string;              // Application name (e.g., "Safari")
  pid: number;               // Process ID
  bundle_id?: string;        // macOS bundle ID (e.g., "com.apple.Safari")
}
Source: crates/core/src/node.rs:78

ElementState Type

interface ElementState {
  role: string;              // "button", "textfield", etc.
  states?: string[];         // Current states (omitted if empty)
  value?: string;            // Current value (omitted if null)
}
Source: crates/core/src/action.rs:106 Returned in post_state field after action commands.

Serialization Rules

Omitted Fields

agent-desktop uses aggressive field omission to minimize JSON payload size:
#[serde(skip_serializing_if = "Option::is_none")]
pub name: Option<String>,

#[serde(skip_serializing_if = "Vec::is_empty", default)]
pub states: Vec<String>,
Source: crates/core/src/node.rs:10 Omitted when:
  • Option<T> fields: Omitted if None
  • Vec<T> fields: Omitted if empty
  • bounds: Omitted unless --include-bounds flag used
  • post_state: Omitted if element state unchanged after action

Example: Minimal vs. Full

Minimal (no optional fields):
{
  "ref_id": "@e1",
  "role": "button"
}
Full (all fields populated):
{
  "ref_id": "@e1",
  "role": "button",
  "name": "Save Document",
  "description": "Saves the current file",
  "hint": "Cmd+S",
  "states": ["enabled", "focused"],
  "bounds": {"x": 20.0, "y": 80.0, "width": 100.0, "height": 32.0},
  "children": []
}

Floating-Point Handling

Bounds use f64 with fallback to 0.0 for invalid values:
fn f64_or_zero<'de, D: Deserializer<'de>>(deserializer: D) -> Result<f64, D::Error> {
    Option::<f64>::deserialize(deserializer).map(|opt| opt.unwrap_or(0.0))
}
Source: crates/core/src/node.rs:44 This handles edge cases where accessibility APIs return NaN or null coordinates.

Parsing Examples

Python

import json
import subprocess

result = subprocess.run(
    ["agent-desktop", "snapshot", "--app", "Finder", "-i"],
    capture_output=True,
    text=True
)

data = json.loads(result.stdout)

if data["ok"]:
    ref_count = data["data"]["ref_count"]
    tree = data["data"]["tree"]
    print(f"Found {ref_count} interactive elements")
else:
    error = data["error"]
    print(f"Error: {error['code']} - {error['message']}")
    if "suggestion" in error:
        print(f"Suggestion: {error['suggestion']}")

JavaScript/TypeScript

import { execSync } from 'child_process';

const output = execSync(
  'agent-desktop snapshot --app Safari -i',
  { encoding: 'utf-8' }
);

const response = JSON.parse(output);

if (response.ok) {
  const refs = findAllRefs(response.data.tree);
  console.log(`Found refs: ${refs.join(', ')}`);
} else {
  console.error(`${response.error.code}: ${response.error.message}`);
}

function findAllRefs(node: any): string[] {
  const refs: string[] = [];
  if (node.ref_id) refs.push(node.ref_id);
  if (node.children) {
    for (const child of node.children) {
      refs.push(...findAllRefs(child));
    }
  }
  return refs;
}

Rust

use serde_json::Value;
use std::process::Command;

let output = Command::new("agent-desktop")
    .args(["snapshot", "--app", "TextEdit", "-i"])
    .output()
    .expect("Failed to execute");

let response: Value = serde_json::from_slice(&output.stdout)
    .expect("Invalid JSON");

if response["ok"].as_bool().unwrap() {
    let ref_count = response["data"]["ref_count"].as_u64().unwrap();
    println!("Ref count: {}", ref_count);
} else {
    let error = &response["error"];
    eprintln!("Error: {} - {}",
        error["code"].as_str().unwrap(),
        error["message"].as_str().unwrap()
    );
}

Best Practices

1

Always check ok field first

Never assume success. Check response.ok before accessing data.
2

Handle missing optional fields

Use safe navigation: node.name ?? 'Unnamed' or node.get('bounds').
3

Parse errors for recovery

Read error.suggestion to determine next action (e.g., re-snapshot on STALE_REF).
4

Use type definitions

Generate types from JSON Schema (Phase 3 feature) or use examples as reference.

Next Steps

Error Handling

Complete error code reference and recovery patterns

Workflow

Learn the snapshot → decide → act loop

Build docs developers (and LLMs) love