Skip to main content

Overview

Sentinel AI maintains a vector knowledge base of technical documentation (PostgreSQL, Docker, Nginx manuals) to power its RAG-based diagnosis system. The knowledge base uses:
  • LlamaParse for high-quality PDF parsing
  • Pinecone for serverless vector storage
  • OpenAI embeddings for semantic search
  • Chunking strategies for optimal retrieval

Architecture

The knowledge base is implemented in src/core/knowledge.py:13-96:
class VectorKnowledgeBase:
    def __init__(self):
        self.pc = Pinecone(api_key=config.PINECONE_API_KEY)
        self.index_name = config.PINECONE_INDEX_NAME
        self.ensure_index_exists()
        self.vector_store = PineconeVectorStore(
            pinecone_index=self.pc.Index(self.index_name)
        )
        self.embed_model = OpenAIEmbedding(model=config.EMBEDDING_MODEL)
        self.llm = OpenAI(model=config.MODEL_NAME, temperature=config.TEMPERATURE)
        self.reranker = CohereRerank(api_key=config.COHERE_API_KEY, top_n=5)
        self.index = None

Index Management

Pinecone Index Creation

The system automatically creates a Pinecone index on first run. Implementation (src/core/knowledge.py:27-38):
def ensure_index_exists(self):
    existing_indexes = [i.name for i in self.pc.list_indexes()]
    if self.index_name not in existing_indexes:
        print(f"[RAG] Creando indice en Pinecone: {self.index_name}")
        self.pc.create_index(
            name=self.index_name,
            dimension=config.EMBEDDING_DIM,  # 1536 for text-embedding-3-small
            metric="cosine",
            spec=ServerlessSpec(cloud="aws", region="us-east-1")
        )
    else:
        print(f"[RAG] Indice existente: {self.index_name}")
Index Configuration:
  • Name: sentinel-ai-index (from config.PINECONE_INDEX_NAME)
  • Dimensions: 1536 (matches OpenAI text-embedding-3-small)
  • Metric: Cosine similarity
  • Infrastructure: AWS Serverless (us-east-1)
Pinecone’s serverless tier auto-scales based on query load, eliminating the need for capacity planning.

Document Ingestion

Bulk Manual Ingestion

The system ingests all PDF manuals from the data/manuals/ directory. Implementation (src/core/knowledge.py:40-64):
def ingest_manuals(self):
    print("[RAG] Iniciando ingesta de manuales con LlamaParse...")
    
    # Configure LlamaParse for PDF parsing
    parser = LlamaParse(
        api_key=config.LLAMA_CLOUD_API_KEY,
        result_type="markdown",  # Convert to markdown for better structure
        verbose=True,
        language="en",
    )
    file_extractor = {".pdf": parser}
    
    # Load documents
    reader = SimpleDirectoryReader(
        input_dir=config.MANUALS_DIR,
        file_extractor=file_extractor
    )
    documents = reader.load_data()
    print(f"[RAG] {len(documents)} documentos cargados. Indexando en Pinecone...")
    
    # Chunk and index
    from llama_index.core.node_parser import SentenceSplitter
    storage_context = StorageContext.from_defaults(vector_store=self.vector_store)
    self.index = VectorStoreIndex.from_documents(
        documents,
        storage_context=storage_context,
        embed_model=self.embed_model,
        transformations=[SentenceSplitter(chunk_size=1024, chunk_overlap=200)]
    )
    print("[RAG] Ingesta completada.")
1
LlamaParse PDF Extraction
2
LlamaParse converts PDFs to structured markdown, preserving:
3
  • Tables (converted to markdown tables)
  • Code blocks (with syntax detection)
  • Headers (for hierarchical structure)
  • Lists (bullets and numbered)
  • 4
    Why LlamaParse?
    5
  • Standard PDF parsers (PyPDF2, pdfplumber) struggle with complex layouts
  • LlamaParse uses vision models to understand page structure
  • Markdown output is LLM-friendly and semantically rich
  • 6
    Chunking Strategy
    7
    Sentence-based splitting with overlap:
    8
    SentenceSplitter(chunk_size=1024, chunk_overlap=200)
    
    9
  • Chunk size: 1024 tokens (~3-4 paragraphs)
  • Overlap: 200 tokens (~1 paragraph) to preserve context across boundaries
  • Splitting logic: Respects sentence boundaries (no mid-sentence cuts)
  • 10
    The 200-token overlap ensures that queries matching content near chunk boundaries retrieve both adjacent chunks.
    11
    Embedding Generation
    12
    Each chunk is embedded using OpenAI’s text-embedding-3-small:
    13
    self.embed_model = OpenAIEmbedding(model=config.EMBEDDING_MODEL)
    
    14
    Embedding properties:
    15
  • Model: text-embedding-3-small
  • Dimensions: 1536
  • Cost: $0.02 per 1M tokens
  • Latency: ~100ms per chunk (batched)
  • 16
    Pinecone Upload
    17
    Embeddings are uploaded to Pinecone with metadata:
    18
    storage_context = StorageContext.from_defaults(vector_store=self.vector_store)
    self.index = VectorStoreIndex.from_documents(
        documents,
        storage_context=storage_context,
        embed_model=self.embed_model,
        transformations=[SentenceSplitter(chunk_size=1024, chunk_overlap=200)]
    )
    
    19
    Stored metadata per chunk:
    20
  • file_name: Source PDF filename
  • page_label: Page number (if available)
  • chunk_id: Unique identifier
  • Incremental File Ingestion

    The system supports adding individual files without reindexing the entire corpus. Implementation (src/core/knowledge.py:66-96):
    def ingest_file(self, filepath: str):
        print(f"[RAG] Ingesta incremental: {os.path.basename(filepath)}")
        
        # Parse PDF or text file
        if filepath.endswith(".pdf"):
            parser = LlamaParse(
                api_key=config.LLAMA_CLOUD_API_KEY,
                result_type="markdown",
                verbose=True,
                language="en",
            )
            reader = SimpleDirectoryReader(
                input_files=[filepath],
                file_extractor={".pdf": parser}
            )
        else:
            reader = SimpleDirectoryReader(input_files=[filepath])
        
        documents = reader.load_data()
        print(f"[RAG] {len(documents)} fragmentos cargados desde {os.path.basename(filepath)}.")
        
        # Chunk documents
        from llama_index.core.node_parser import SentenceSplitter
        splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=200)
        nodes = splitter.get_nodes_from_documents(documents)
        
        # Insert into existing index
        if not self.index:
            self.index = VectorStoreIndex.from_vector_store(
                self.vector_store,
                embed_model=self.embed_model
            )
        
        self.index.insert_nodes(nodes)
        print(f"[RAG] {len(nodes)} nodos insertados en Pinecone.")
    
    Use ingest_file() to add new documentation without disrupting the existing index. This is ideal for incremental updates or user-uploaded manuals.

    Directory Structure

    data/
    ├── manuals/           # PDF documentation directory
    │   ├── postgresql-16-manual.pdf
    │   ├── docker-reference.pdf
    │   └── nginx-admin-guide.pdf
    └── memory/            # Episodic memory storage
        └── episodes.json
    
    Configuration (src/core/config.py:30-32):
    DATA_DIR = "data"
    MANUALS_DIR = os.path.join(DATA_DIR, "manuals")
    MEMORY_DIR = os.path.join(DATA_DIR, "memory")
    

    Initialization

    The knowledge base is initialized at application startup. Implementation (src/core/knowledge.py:242-249):
    kb = None
    
    def init_knowledge_base():
        global kb
        try:
            kb = VectorKnowledgeBase()
            log("system", "Base de conocimiento inicializada correctamente.")
        except Exception as e:
            log("error", f"No se pudo inicializar la base de conocimiento: {e}")
            kb = None
    
    If initialization fails (e.g., missing API keys), the agent will run without RAG support. The diagnose node will skip documentation queries and rely solely on episodic memory.

    Retrieval Process

    When the agent queries the knowledge base:
    # From src/agent/nodes/diagnose.py:40-42
    if kb:
        rag_context = kb.query(f"How to fix: {error}")
    
    The system:
    1. Rewrites the query into 5 variations
    2. Retrieves top-5 chunks per query (30 candidates)
    3. Reranks with Cohere to get top-5 most relevant
    4. Synthesizes an answer with the LLM
    See RAG System for detailed pipeline documentation.

    Performance Characteristics

    Ingestion

    • PDF parsing: ~10-15s per document (LlamaParse)
    • Embedding: ~5s per 100 chunks
    • Pinecone upload: ~2s per 100 vectors
    • Total: ~2-3 minutes for 500-page manual

    Retrieval

    • Query embedding: ~100ms
    • Pinecone search: ~200-300ms (6 queries)
    • Reranking: ~300ms
    • Total: ~600-700ms

    Scaling Considerations

    Pinecone serverless supports millions of vectors. The current setup can handle 10,000+ pages of documentation without performance degradation. Query latency remains constant due to approximate nearest neighbor (ANN) search.
    Run ingest_manuals() when documentation is updated (e.g., PostgreSQL 17 release). Use ingest_file() for incremental updates. The index persists in Pinecone, so re-ingestion is not needed on application restart.
    Yes. ingest_file() supports markdown, text, and JSON files without LlamaParse. PDF parsing is only invoked for .pdf extensions.
    Currently, the system does not track document versions. To update a document, delete the old vectors from Pinecone and re-ingest. Future versions may implement versioned metadata for incremental updates.

    Environment Variables

    Required API keys for the knowledge base:
    OPENAI_API_KEY=sk-...
    PINECONE_API_KEY=...
    LLAMA_CLOUD_API_KEY=llx-...
    COHERE_API_KEY=...
    
    See Configuration for setup instructions.

    Build docs developers (and LLMs) love