Sentinel AI maintains a vector knowledge base of technical documentation (PostgreSQL, Docker, Nginx manuals) to power its RAG-based diagnosis system. The knowledge base uses:
The system automatically creates a Pinecone index on first run.Implementation (src/core/knowledge.py:27-38):
def ensure_index_exists(self): existing_indexes = [i.name for i in self.pc.list_indexes()] if self.index_name not in existing_indexes: print(f"[RAG] Creando indice en Pinecone: {self.index_name}") self.pc.create_index( name=self.index_name, dimension=config.EMBEDDING_DIM, # 1536 for text-embedding-3-small metric="cosine", spec=ServerlessSpec(cloud="aws", region="us-east-1") ) else: print(f"[RAG] Indice existente: {self.index_name}")
The knowledge base is initialized at application startup.Implementation (src/core/knowledge.py:242-249):
kb = Nonedef init_knowledge_base(): global kb try: kb = VectorKnowledgeBase() log("system", "Base de conocimiento inicializada correctamente.") except Exception as e: log("error", f"No se pudo inicializar la base de conocimiento: {e}") kb = None
If initialization fails (e.g., missing API keys), the agent will run without RAG support. The diagnose node will skip documentation queries and rely solely on episodic memory.
Pinecone serverless supports millions of vectors. The current setup can handle 10,000+ pages of documentation without performance degradation. Query latency remains constant due to approximate nearest neighbor (ANN) search.
How often should the index be updated?
Run ingest_manuals() when documentation is updated (e.g., PostgreSQL 17 release). Use ingest_file() for incremental updates. The index persists in Pinecone, so re-ingestion is not needed on application restart.
Can the system handle non-PDF documents?
Yes. ingest_file() supports markdown, text, and JSON files without LlamaParse. PDF parsing is only invoked for .pdf extensions.
How is document freshness maintained?
Currently, the system does not track document versions. To update a document, delete the old vectors from Pinecone and re-ingest. Future versions may implement versioned metadata for incremental updates.