Knowledge Base

Overview

Sentinel AI maintains a vector knowledge base of technical documentation (PostgreSQL, Docker, Nginx manuals) to power its RAG-based diagnosis system. The knowledge base uses:

LlamaParse for high-quality PDF parsing
Pinecone for serverless vector storage
OpenAI embeddings for semantic search
Chunking strategies for optimal retrieval

Architecture

The knowledge base is implemented in src/core/knowledge.py:13-96:

class VectorKnowledgeBase:
    def __init__(self):
        self.pc = Pinecone(api_key=config.PINECONE_API_KEY)
        self.index_name = config.PINECONE_INDEX_NAME
        self.ensure_index_exists()
        self.vector_store = PineconeVectorStore(
            pinecone_index=self.pc.Index(self.index_name)
        )
        self.embed_model = OpenAIEmbedding(model=config.EMBEDDING_MODEL)
        self.llm = OpenAI(model=config.MODEL_NAME, temperature=config.TEMPERATURE)
        self.reranker = CohereRerank(api_key=config.COHERE_API_KEY, top_n=5)
        self.index = None

Index Management

Pinecone Index Creation

The system automatically creates a Pinecone index on first run. Implementation (src/core/knowledge.py:27-38):

def ensure_index_exists(self):
    existing_indexes = [i.name for i in self.pc.list_indexes()]
    if self.index_name not in existing_indexes:
        print(f"[RAG] Creando indice en Pinecone: {self.index_name}")
        self.pc.create_index(
            name=self.index_name,
            dimension=config.EMBEDDING_DIM,  # 1536 for text-embedding-3-small
            metric="cosine",
            spec=ServerlessSpec(cloud="aws", region="us-east-1")
        )
    else:
        print(f"[RAG] Indice existente: {self.index_name}")

Index Configuration:

Name: sentinel-ai-index (from config.PINECONE_INDEX_NAME)
Dimensions: 1536 (matches OpenAI text-embedding-3-small)
Metric: Cosine similarity
Infrastructure: AWS Serverless (us-east-1)

Pinecone’s serverless tier auto-scales based on query load, eliminating the need for capacity planning.

Document Ingestion

Bulk Manual Ingestion

The system ingests all PDF manuals from the data/manuals/ directory. Implementation (src/core/knowledge.py:40-64):

def ingest_manuals(self):
    print("[RAG] Iniciando ingesta de manuales con LlamaParse...")
    
    # Configure LlamaParse for PDF parsing
    parser = LlamaParse(
        api_key=config.LLAMA_CLOUD_API_KEY,
        result_type="markdown",  # Convert to markdown for better structure
        verbose=True,
        language="en",
    )
    file_extractor = {".pdf": parser}
    
    # Load documents
    reader = SimpleDirectoryReader(
        input_dir=config.MANUALS_DIR,
        file_extractor=file_extractor
    )
    documents = reader.load_data()
    print(f"[RAG] {len(documents)} documentos cargados. Indexando en Pinecone...")
    
    # Chunk and index
    from llama_index.core.node_parser import SentenceSplitter
    storage_context = StorageContext.from_defaults(vector_store=self.vector_store)
    self.index = VectorStoreIndex.from_documents(
        documents,
        storage_context=storage_context,
        embed_model=self.embed_model,
        transformations=[SentenceSplitter(chunk_size=1024, chunk_overlap=200)]
    )
    print("[RAG] Ingesta completada.")

LlamaParse PDF Extraction

LlamaParse converts PDFs to structured markdown, preserving:

Tables (converted to markdown tables)

Code blocks (with syntax detection)

Headers (for hierarchical structure)

Lists (bullets and numbered)

Why LlamaParse?

Standard PDF parsers (PyPDF2, pdfplumber) struggle with complex layouts

LlamaParse uses vision models to understand page structure

Markdown output is LLM-friendly and semantically rich

Chunking Strategy

Sentence-based splitting with overlap:

SentenceSplitter(chunk_size=1024, chunk_overlap=200)

Chunk size: 1024 tokens (~3-4 paragraphs)

Overlap: 200 tokens (~1 paragraph) to preserve context across boundaries

Splitting logic: Respects sentence boundaries (no mid-sentence cuts)

The 200-token overlap ensures that queries matching content near chunk boundaries retrieve both adjacent chunks.

Embedding Generation

Each chunk is embedded using OpenAI’s text-embedding-3-small:

self.embed_model = OpenAIEmbedding(model=config.EMBEDDING_MODEL)

Embedding properties:

Model: text-embedding-3-small

Dimensions: 1536

Cost: $0.02 per 1M tokens

Latency: ~100ms per chunk (batched)

Pinecone Upload

Embeddings are uploaded to Pinecone with metadata:

storage_context = StorageContext.from_defaults(vector_store=self.vector_store)
self.index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    embed_model=self.embed_model,
    transformations=[SentenceSplitter(chunk_size=1024, chunk_overlap=200)]
)

Stored metadata per chunk:

file_name: Source PDF filename

page_label: Page number (if available)

chunk_id: Unique identifier

Incremental File Ingestion

The system supports adding individual files without reindexing the entire corpus. Implementation (src/core/knowledge.py:66-96):

def ingest_file(self, filepath: str):
    print(f"[RAG] Ingesta incremental: {os.path.basename(filepath)}")
    
    # Parse PDF or text file
    if filepath.endswith(".pdf"):
        parser = LlamaParse(
            api_key=config.LLAMA_CLOUD_API_KEY,
            result_type="markdown",
            verbose=True,
            language="en",
        )
        reader = SimpleDirectoryReader(
            input_files=[filepath],
            file_extractor={".pdf": parser}
        )
    else:
        reader = SimpleDirectoryReader(input_files=[filepath])
    
    documents = reader.load_data()
    print(f"[RAG] {len(documents)} fragmentos cargados desde {os.path.basename(filepath)}.")
    
    # Chunk documents
    from llama_index.core.node_parser import SentenceSplitter
    splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=200)
    nodes = splitter.get_nodes_from_documents(documents)
    
    # Insert into existing index
    if not self.index:
        self.index = VectorStoreIndex.from_vector_store(
            self.vector_store,
            embed_model=self.embed_model
        )
    
    self.index.insert_nodes(nodes)
    print(f"[RAG] {len(nodes)} nodos insertados en Pinecone.")

Use ingest_file() to add new documentation without disrupting the existing index. This is ideal for incremental updates or user-uploaded manuals.

Directory Structure

data/
├── manuals/           # PDF documentation directory
│   ├── postgresql-16-manual.pdf
│   ├── docker-reference.pdf
│   └── nginx-admin-guide.pdf
└── memory/            # Episodic memory storage
    └── episodes.json

Configuration (src/core/config.py:30-32):

DATA_DIR = "data"
MANUALS_DIR = os.path.join(DATA_DIR, "manuals")
MEMORY_DIR = os.path.join(DATA_DIR, "memory")

Initialization

The knowledge base is initialized at application startup. Implementation (src/core/knowledge.py:242-249):

kb = None

def init_knowledge_base():
    global kb
    try:
        kb = VectorKnowledgeBase()
        log("system", "Base de conocimiento inicializada correctamente.")
    except Exception as e:
        log("error", f"No se pudo inicializar la base de conocimiento: {e}")
        kb = None

If initialization fails (e.g., missing API keys), the agent will run without RAG support. The diagnose node will skip documentation queries and rely solely on episodic memory.

Retrieval Process

When the agent queries the knowledge base:

# From src/agent/nodes/diagnose.py:40-42
if kb:
    rag_context = kb.query(f"How to fix: {error}")

The system:

Rewrites the query into 5 variations
Retrieves top-5 chunks per query (30 candidates)
Reranks with Cohere to get top-5 most relevant
Synthesizes an answer with the LLM

See RAG System for detailed pipeline documentation.

Performance Characteristics

Ingestion

PDF parsing: ~10-15s per document (LlamaParse)
Embedding: ~5s per 100 chunks
Pinecone upload: ~2s per 100 vectors
Total: ~2-3 minutes for 500-page manual

Retrieval

Query embedding: ~100ms
Pinecone search: ~200-300ms (6 queries)
Reranking: ~300ms
Total: ~600-700ms

Scaling Considerations

How many documents can the system handle?

Pinecone serverless supports millions of vectors. The current setup can handle 10,000+ pages of documentation without performance degradation. Query latency remains constant due to approximate nearest neighbor (ANN) search.

How often should the index be updated?

Run ingest_manuals() when documentation is updated (e.g., PostgreSQL 17 release). Use ingest_file() for incremental updates. The index persists in Pinecone, so re-ingestion is not needed on application restart.

Can the system handle non-PDF documents?

Yes. ingest_file() supports markdown, text, and JSON files without LlamaParse. PDF parsing is only invoked for .pdf extensions.

How is document freshness maintained?

Currently, the system does not track document versions. To update a document, delete the old vectors from Pinecone and re-ingest. Future versions may implement versioned metadata for incremental updates.

Environment Variables

Required API keys for the knowledge base:

OPENAI_API_KEY=sk-...
PINECONE_API_KEY=...
LLAMA_CLOUD_API_KEY=llx-...
COHERE_API_KEY=...

See Configuration for setup instructions.

RAG System - How queries are processed
Agent Workflow - How the diagnose node uses the knowledge base
Configuration - API key setup

Get Started

Core Concepts

Configuration

Agent Operations

Dashboard

Advanced

Overview

Architecture

Index Management

Pinecone Index Creation

Document Ingestion

Bulk Manual Ingestion

Incremental File Ingestion

Directory Structure

Initialization

Retrieval Process

Performance Characteristics

Ingestion

Retrieval

Scaling Considerations

Environment Variables

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

Agent Operations

Dashboard

Advanced

​Overview

​Architecture

​Index Management

​Pinecone Index Creation

​Document Ingestion

​Bulk Manual Ingestion

​Incremental File Ingestion

​Directory Structure

​Initialization

​Retrieval Process

​Performance Characteristics

Ingestion

Retrieval

​Scaling Considerations

​Environment Variables

​Related Documentation

Build docs developers (and LLMs) love

Overview

Architecture

Index Management

Pinecone Index Creation

Document Ingestion

Bulk Manual Ingestion

Incremental File Ingestion

Directory Structure

Initialization

Retrieval Process

Performance Characteristics

Scaling Considerations

Environment Variables

Related Documentation