AI-powered error analysis with memory and knowledge base integration
The diagnosis node analyzes detected failures using a combination of LLM reasoning, historical memory, and RAG-based knowledge retrieval to determine root causes and recommend solutions.
Queries the memory system for relevant historical data:
failed_commands = memory.get_failed_commands(error)if failed_commands: memory_consulted = True memory_context += "COMANDOS QUE YA FALLARON (NO repetir):\n" for cmd in failed_commands: memory_context += f"- {cmd}\n"
Also searches for similar successful resolutions:
similar = memory.find_similar(error)if similar and similar["success"]: memory_context += f"\nSolucion exitosa previa: {similar['command']}\n"
3
Knowledge Base Query
Retrieves relevant documentation using RAG:
if kb: log("diagnose", "Consultando base de conocimiento (RAG)...") rag_context = kb.query(f"How to fix: {error}")
4
LLM Analysis
Constructs a comprehensive prompt and requests diagnosis:
messages = [ SystemMessage(content=( "Eres Sentinel AI, un agente DevOps autonomo.\n" "Analiza el error y proporciona un diagnostico BREVE (maximo 3 lineas).\n" f"\nServicio afectado: {service}\n" f"\n{memory_context}" f"\nDocumentacion tecnica:\n{rag_context[:1000]}\n" )), HumanMessage(content=f"Error: {error}")]response = llm.invoke(messages)diagnosis = response.content.strip()
The diagnosis node heavily relies on the memory system to avoid repeating mistakes:
Failed Commands
Retrieves commands that previously failed for the same error and explicitly instructs the LLM not to repeat them.
Successful Solutions
Finds similar past incidents that were successfully resolved and prioritizes those approaches.
The system explicitly forbids the LLM from suggesting commands that have already failed, as defined in the prompt:“REGLAS CRITICAS: NO sugieras comandos que ya fallaron (listados arriba).”
The diagnosis prompt includes critical constraints:
SystemMessage(content=( "Eres Sentinel AI, un agente DevOps autonomo.\n" "Analiza el error y proporciona un diagnostico BREVE (maximo 3 lineas).\n" "Indica la causa probable y la solucion recomendada.\n" f"\nHistorial de intentos:\n{chr(10).join(prior_logs[-3:]) if prior_logs else 'Primer intento.'}\n" "\nREGLAS CRITICAS:" "\n1. NO sugieras comandos que ya fallaron (listados arriba)." "\n2. Si 'service' o 'apt-get' fallan, prueba alternativas como 'systemctl', 'dmesg', o verificar ficheros de log especificos."))
The diagnosis is intentionally kept brief (maximum 3 lines) to focus on actionable insights rather than verbose explanations.