The execution node runs approved remediation commands on target systems via SSH, captures detailed results, and stores complete episodes in memory for future learning.
Overview
After a remediation plan passes security approval, the execution node establishes an SSH connection and runs each command sequentially, monitoring exit codes and output for success/failure indicators.
The execution node is where planned remediation becomes real system changes. All operations are logged and stored in memory for continuous learning.
Execution Workflow
Approval Check
Verifies that the plan has been approved before proceeding: if state.get( "approval_status" ) != "APPROVED" :
log( "warning" , "Comando no aprobado. Saltando ejecucion." )
return { "current_step" : "execute" }
Command Parsing
Extracts individual commands from the plan: plan = state.get( "candidate_plan" , "" )
commands = [cmd.strip() for cmd in plan.split( " \n " ) if cmd.strip()]
SSH Connection
Establishes connection to the target server: ssh = SSHClient(
hostname = config. SSH_HOST ,
port = config. SSH_PORT ,
username = config. SSH_USER ,
password = config. SSH_PASS
)
Sequential Execution
Runs each command and captures results: for i, command in enumerate (commands):
needs_sudo = command.strip().startswith( "sudo" )
clean = command.replace( "sudo " , "" , 1 ) if needs_sudo else command
log( "execute" , f "[ { i + 1 } / { len (commands) } ] { command } " )
code, out, err = ssh.execute_command(clean, use_sudo = needs_sudo)
Memory Storage
Saves the complete episode for future reference: memory.save_episode(
error = error_text,
diagnosis = diagnosis_text,
command = plan,
result = " | " .join(all_results),
success = overall_success
)
SSH Command Execution
The node handles sudo execution by parsing the command:
needs_sudo = command.strip().startswith( "sudo" )
clean = command.replace( "sudo " , "" , 1 ) if needs_sudo else command
code, out, err = ssh.execute_command(clean, use_sudo = needs_sudo)
The SSHClient’s use_sudo parameter handles privilege escalation transparently, separating the command from the execution context.
Result Capture
Each command’s output is captured and formatted:
result_str = f "[ { command } ] codigo: { code } "
if out:
result_str += f " salida: { out[: 200 ] } "
log( "execute" , f "Salida: { out[: 100 ] } ..." )
if err:
result_str += f " error: { err[: 200 ] } "
log( "error" , f "Error: { err[: 100 ] } ..." )
all_results.append(result_str)
Exit Code Numeric return value (0 = success)
Standard Output Command stdout (truncated to 200 chars)
Standard Error Command stderr (truncated to 200 chars)
Success Detection
Overall success is determined by exit codes:
overall_success = True
for i, command in enumerate (commands):
code, out, err = ssh.execute_command(clean, use_sudo = needs_sudo)
if code != 0 :
overall_success = False
log( "error" , f "Fallo en paso { i + 1 } . Exit code: { code } " )
A non-zero exit code from any command marks the entire episode as unsuccessful, even if subsequent commands succeed.
Episode Memory
The execution node stores complete episodes in the memory system:
error_text = state.get( "current_error" , "" )
diagnosis_text = state.get( "diagnosis_log" , [ "" ])[ - 1 ] if state.get( "diagnosis_log" ) else ""
memory.save_episode(
error = error_text,
diagnosis = diagnosis_text,
command = plan,
result = " | " .join(all_results),
success = overall_success
)
Each episode contains:
Error Context The original failure that triggered remediation
Diagnosis The AI-generated analysis of the problem
Commands The complete remediation plan that was executed
Results Exit codes, stdout, and stderr from all commands
Success Flag Boolean indicating whether remediation succeeded
Timestamp When the episode occurred (added by memory system)
Learning from Episodes
Stored episodes enable future optimization:
Diagnosis node queries memory to avoid failed commands
Planning node references successful past solutions
Human operators can review historical remediation patterns
System improves over time through accumulated experience
The memory system implements semantic similarity search, allowing the agent to learn from related errors even if they’re not identical.
Error Handling
The execution node handles exceptions gracefully:
except Exception as e:
log( "error" , f "Excepcion durante ejecucion: { e } " )
memory.save_episode(
error = state.get( "current_error" , "" ),
diagnosis = "" ,
command = plan,
result = f "Excepcion: { str (e) } " ,
success = False
)
return {
"current_step" : "execute" ,
"diagnosis_log" : state.get( "diagnosis_log" , []) + [ f "Excepcion: { str (e) } " ]
}
Even execution failures are stored in memory, helping the system learn which approaches cause exceptions.
State Updates
The execution node updates the agent state:
return {
"current_step" : "execute" ,
"diagnosis_log" : state.get( "diagnosis_log" , []) + [ " | " .join(all_results)]
}
Results are appended to the diagnosis log, creating a complete audit trail of the remediation attempt.
Progress Logging
Detailed logging provides observability:
log( "execute" , "Iniciando ejecucion de comandos..." )
log( "execute" , f "[ { i + 1 } / { len (commands) } ] { command } " )
log( "execute" , f "Salida: { out[: 100 ] } ..." )
log( "error" , f "Error: { err[: 100 ] } ..." )
log( "error" , f "Fallo en paso { i + 1 } . Exit code: { code } " )
status = "exitoso" if overall_success else "parcial"
log( "execute" , f "Episodio registrado: { status } " )
Empty Plan Handling
The node gracefully handles empty plans:
if not commands:
log( "warning" , "No hay comandos para ejecutar." )
return { "current_step" : "execute" }
Output Truncation
To prevent memory bloat, outputs are truncated:
result_str += f " salida: { out[: 200 ] } "
result_str += f " error: { err[: 200 ] } "
This balances detail with storage efficiency while preserving critical diagnostic information.
Sequential Execution
Commands run one at a time in order:
for i, command in enumerate (commands):
code, out, err = ssh.execute_command(clean, use_sudo = needs_sudo)
# Continue to next command regardless of exit code
All commands execute even if earlier ones fail, allowing subsequent diagnostic commands (like log checks) to run after failed remediation attempts.
Implementation Location
Source: src/agent/nodes/execute.py:19
Next Steps
After execution, the workflow proceeds to verification where the system checks whether the remediation successfully restored the failed service.