Execution

The execution node runs approved remediation commands on target systems via SSH, captures detailed results, and stores complete episodes in memory for future learning.

Overview

After a remediation plan passes security approval, the execution node establishes an SSH connection and runs each command sequentially, monitoring exit codes and output for success/failure indicators.

The execution node is where planned remediation becomes real system changes. All operations are logged and stored in memory for continuous learning.

Execution Workflow

Approval Check

Verifies that the plan has been approved before proceeding:

if state.get("approval_status") != "APPROVED":
    log("warning", "Comando no aprobado. Saltando ejecucion.")
    return {"current_step": "execute"}

Command Parsing

Extracts individual commands from the plan:

plan = state.get("candidate_plan", "")
commands = [cmd.strip() for cmd in plan.split("\n") if cmd.strip()]

SSH Connection

Establishes connection to the target server:

ssh = SSHClient(
    hostname=config.SSH_HOST,
    port=config.SSH_PORT,
    username=config.SSH_USER,
    password=config.SSH_PASS
)

Sequential Execution

Runs each command and captures results:

for i, command in enumerate(commands):
    needs_sudo = command.strip().startswith("sudo")
    clean = command.replace("sudo ", "", 1) if needs_sudo else command
    
    log("execute", f"[{i+1}/{len(commands)}] {command}")
    code, out, err = ssh.execute_command(clean, use_sudo=needs_sudo)

Memory Storage

Saves the complete episode for future reference:

memory.save_episode(
    error=error_text,
    diagnosis=diagnosis_text,
    command=plan,
    result=" | ".join(all_results),
    success=overall_success
)

SSH Command Execution

The node handles sudo execution by parsing the command:

needs_sudo = command.strip().startswith("sudo")
clean = command.replace("sudo ", "", 1) if needs_sudo else command

code, out, err = ssh.execute_command(clean, use_sudo=needs_sudo)

The SSHClient’s use_sudo parameter handles privilege escalation transparently, separating the command from the execution context.

Result Capture

Each command’s output is captured and formatted:

result_str = f"[{command}] codigo:{code}"
if out:
    result_str += f" salida:{out[:200]}"
    log("execute", f"Salida: {out[:100]}...")
if err:
    result_str += f" error:{err[:200]}"
    log("error", f"Error: {err[:100]}...")

all_results.append(result_str)

Exit Code

Numeric return value (0 = success)

Standard Output

Command stdout (truncated to 200 chars)

Standard Error

Command stderr (truncated to 200 chars)

Success Detection

Overall success is determined by exit codes:

overall_success = True

for i, command in enumerate(commands):
    code, out, err = ssh.execute_command(clean, use_sudo=needs_sudo)
    
    if code != 0:
        overall_success = False
        log("error", f"Fallo en paso {i+1}. Exit code: {code}")

A non-zero exit code from any command marks the entire episode as unsuccessful, even if subsequent commands succeed.

Episode Memory

The execution node stores complete episodes in the memory system:

error_text = state.get("current_error", "")
diagnosis_text = state.get("diagnosis_log", [""])[-1] if state.get("diagnosis_log") else ""

memory.save_episode(
    error=error_text,
    diagnosis=diagnosis_text,
    command=plan,
    result=" | ".join(all_results),
    success=overall_success
)

Each episode contains:

Error Context

The original failure that triggered remediation

Diagnosis

The AI-generated analysis of the problem

Commands

The complete remediation plan that was executed

Results

Exit codes, stdout, and stderr from all commands

Success Flag

Boolean indicating whether remediation succeeded

Timestamp

When the episode occurred (added by memory system)

Learning from Episodes

Stored episodes enable future optimization:

Diagnosis node queries memory to avoid failed commands
Planning node references successful past solutions
Human operators can review historical remediation patterns
System improves over time through accumulated experience

The memory system implements semantic similarity search, allowing the agent to learn from related errors even if they’re not identical.

Error Handling

The execution node handles exceptions gracefully:

except Exception as e:
    log("error", f"Excepcion durante ejecucion: {e}")
    memory.save_episode(
        error=state.get("current_error", ""),
        diagnosis="",
        command=plan,
        result=f"Excepcion: {str(e)}",
        success=False
    )
    return {
        "current_step": "execute",
        "diagnosis_log": state.get("diagnosis_log", []) + [f"Excepcion: {str(e)}"]
    }

Even execution failures are stored in memory, helping the system learn which approaches cause exceptions.

State Updates

The execution node updates the agent state:

return {
    "current_step": "execute",
    "diagnosis_log": state.get("diagnosis_log", []) + [" | ".join(all_results)]
}

Results are appended to the diagnosis log, creating a complete audit trail of the remediation attempt.

Progress Logging

Detailed logging provides observability:

log("execute", "Iniciando ejecucion de comandos...")
log("execute", f"[{i+1}/{len(commands)}] {command}")
log("execute", f"Salida: {out[:100]}...")
log("error", f"Error: {err[:100]}...")
log("error", f"Fallo en paso {i+1}. Exit code: {code}")

status = "exitoso" if overall_success else "parcial"
log("execute", f"Episodio registrado: {status}")

Empty Plan Handling

The node gracefully handles empty plans:

if not commands:
    log("warning", "No hay comandos para ejecutar.")
    return {"current_step": "execute"}

Output Truncation

To prevent memory bloat, outputs are truncated:

result_str += f" salida:{out[:200]}"
result_str += f" error:{err[:200]}"

This balances detail with storage efficiency while preserving critical diagnostic information.

Sequential Execution

Commands run one at a time in order:

for i, command in enumerate(commands):
    code, out, err = ssh.execute_command(clean, use_sudo=needs_sudo)
    # Continue to next command regardless of exit code

All commands execute even if earlier ones fail, allowing subsequent diagnostic commands (like log checks) to run after failed remediation attempts.

Implementation Location

Source: src/agent/nodes/execute.py:19

Next Steps

After execution, the workflow proceeds to verification where the system checks whether the remediation successfully restored the failed service.

Get Started

Core Concepts

Configuration

Agent Operations

Dashboard

Advanced

Overview

Execution Workflow

SSH Command Execution

Result Capture

Exit Code

Standard Output

Standard Error

Success Detection

Episode Memory

Error Context

Diagnosis

Commands

Results

Success Flag

Timestamp

Learning from Episodes

Error Handling

State Updates

Progress Logging

Empty Plan Handling

Output Truncation

Sequential Execution

Implementation Location

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

Agent Operations

Dashboard

Advanced

​Overview

​Execution Workflow

​SSH Command Execution

​Result Capture

Exit Code

Standard Output

Standard Error

​Success Detection

​Episode Memory

Error Context

Diagnosis

Commands

Results

Success Flag

Timestamp

​Learning from Episodes

​Error Handling

​State Updates

​Progress Logging

​Empty Plan Handling

​Output Truncation

​Sequential Execution

​Implementation Location

​Next Steps

Build docs developers (and LLMs) love

Overview

Execution Workflow

SSH Command Execution

Result Capture

Success Detection

Episode Memory

Learning from Episodes

Error Handling

State Updates

Progress Logging

Empty Plan Handling

Output Truncation

Sequential Execution

Implementation Location

Next Steps