An agent that evaluates whether an execution step’s output meets its original objective.
MonitoringAgent extends BaseAgent and acts as a judge in a plan-execute-monitor loop. It compares the original objective of a step against the actual output produced by an ExecutionAgent and returns a structured verdict.
Override the default system instruction. The built-in prompt reads:
You are a strict monitoring and evaluation agent. Your task is to comparethe Original Objective of a step with the Actual Output produced by anexecution agent. Determine if the objective was successfully met.Return ONLY a valid JSON object with two keys: 1. 'success' (boolean: true or false) 2. 'feedback' (string: explanation of why it succeeded or failed, and how to fix if failed).Example: {"success": true, "feedback": "Objective met completely."}
Builds a structured evaluation prompt from objective and result, calls invoke(), parses the JSON response, and returns a normalised verdict dictionary.
Human-readable explanation of the verdict. On success this describes what was done correctly; on failure it explains what is missing and how to fix it.
Expected LLM response format:
{"success": true, "feedback": "Objective met completely."}
If the model returns malformed JSON the failure is logged at WARNING level and the method returns {"success": False, "feedback": "Failed to parse monitoring response: <raw_response>"}. This ensures callers always receive a well-typed dictionary rather than an exception.
from langchain_ollama import ChatOllamafrom agents import PlanningAgent, ExecutionAgent, MonitoringAgentllm = ChatOllama(model="llama3")planner = PlanningAgent(llm=llm)executor = ExecutionAgent(llm=llm)monitor = MonitoringAgent(llm=llm)# Generate a plansteps = planner.generate_plan("Write a haiku about the ocean")# Execute and monitor each stepfor step in steps: output = executor.execute_step(step_description=step) verdict = monitor.evaluate(objective=step, result=output) print(f"Step : {step}") print(f"Output : {output}") print(f"Success : {verdict['success']}") print(f"Feedback: {verdict['feedback']}") print() if not verdict["success"]: # Optionally retry the step or log for human review print("[WARN] Step did not meet objective — consider retrying.")
Use verdict["success"] as a gate: only advance to the next step (or mark the overall task complete) when the monitor returns True.