Knowledge Base

Overview

The VectorKnowledgeBase class provides a Retrieval-Augmented Generation (RAG) system powered by Pinecone, LlamaIndex, and OpenAI embeddings. It enables the agent to query technical manuals and documentation to inform diagnosis and remediation.

Import

from src.core.knowledge import VectorKnowledgeBase, kb

The global singleton kb is initialized at startup:

from src.core.knowledge import kb

if kb:
    response = kb.query("How to restart nginx?")
    print(response)

VectorKnowledgeBase

Constructor

class VectorKnowledgeBase:
    def __init__(self)

Initializes the knowledge base with Pinecone vector store and OpenAI embeddings.

self.pc

Pinecone

Pinecone client instance using config.PINECONE_API_KEY

self.index_name

str

Pinecone index name from config.PINECONE_INDEX_NAME

self.vector_store

PineconeVectorStore

LlamaIndex wrapper for Pinecone index

self.embed_model

OpenAIEmbedding

Embedding model (text-embedding-ada-002 by default)

self.llm

OpenAI

GPT-4 instance for query synthesis

self.reranker

CohereRerank

Cohere reranker for improving retrieval quality (top_n=5)

self.index

VectorStoreIndex | None

LlamaIndex vector store index (lazy loaded)

Example

from src.core.knowledge import VectorKnowledgeBase

kb = VectorKnowledgeBase()
print(f"Connected to index: {kb.index_name}")

The constructor automatically creates the Pinecone index if it doesn’t exist using ensure_index_exists().

Methods

ensure_index_exists

def ensure_index_exists(self) -> None

Creates the Pinecone index if it doesn’t already exist. Behavior:

Lists existing Pinecone indexes
If index_name not found, creates a new index with:
- Dimension: config.EMBEDDING_DIM (1536 for Ada-002)
- Metric: "cosine"
- Spec: Serverless on AWS us-east-1

Example:

kb = VectorKnowledgeBase()
kb.ensure_index_exists()
# [RAG] Indice existente: sentinel-docs

Index creation can take 1-2 minutes. The method will block until the index is ready.

ingest_manuals

def ingest_manuals(self) -> None

Ingests all PDF manuals from config.MANUALS_DIR using LlamaParse. Behavior:

Initializes LlamaParse for PDF → Markdown conversion
Reads all PDFs from the manuals directory
Parses documents with LlamaParse (extracts tables, formatting)
Splits documents into chunks (1024 tokens, 200 overlap)
Generates embeddings and stores in Pinecone

Example:

from src.core.knowledge import VectorKnowledgeBase

kb = VectorKnowledgeBase()
kb.ingest_manuals()
# [RAG] Iniciando ingesta de manuales con LlamaParse...
# [RAG] 47 documentos cargados. Indexando en Pinecone...
# [RAG] Ingesta completada.

Configuration:

config.MANUALS_DIR

str

default:"./manuals"

Directory containing PDF documentation files

config.LLAMA_CLOUD_API_KEY

str

required

API key for LlamaParse service

Run ingest_manuals() once during initial setup or when documentation is updated. The vector embeddings persist in Pinecone.

ingest_file

def ingest_file(self, filepath: str) -> None

Ingests a single file (PDF or text) into the knowledge base.

filepath

str

required

Absolute path to the file to ingest

Behavior:

Detects file type (PDF uses LlamaParse, others use SimpleDirectoryReader)
Loads and parses the document
Splits into chunks using SentenceSplitter
Inserts nodes into the existing vector index

Example:

from src.core.knowledge import kb

kb.ingest_file("/path/to/new_manual.pdf")
# [RAG] Ingesta incremental: new_manual.pdf
# [RAG] 12 fragmentos cargados desde new_manual.pdf.
# [RAG] 12 nodos insertados en Pinecone.

Chunking Parameters:

chunk_size

int

default:"1024"

Maximum tokens per chunk

chunk_overlap

int

default:"200"

Overlapping tokens between chunks for context preservation

Use ingest_file() to incrementally add new documentation without re-indexing the entire corpus.

query

def query(self, query_text: str) -> str

Performs a RAG query and returns a synthesized answer.

query_text

str

required

Natural language question in English or Spanish

return

str

Synthesized answer in Spanish with sources appended

Behavior:

Query Rewriting: Generates 5 variations using _rewrite_query()
Retrieval: Fetches top 5 chunks for each query (similarity search)
Deduplication: Removes duplicate nodes by ID
Reranking: Uses Cohere to select the 5 most relevant chunks
Synthesis: Generates answer using GPT-4 with strict rules
Source Attribution: Appends source file names

Example:

from src.core.knowledge import kb

response = kb.query("How do I fix nginx port conflict?")
print(response)

Output:

Un conflicto de puerto en nginx generalmente ocurre cuando otro proceso está usando el puerto 80 o 443.

Para solucionarlo:
1. Identifica el proceso: `sudo ss -tulpn | grep :80`
2. Detén nginx: `sudo service nginx stop`
3. Mata el proceso conflictivo: `sudo kill -9 <PID>`
4. Reinicia nginx: `sudo service nginx start`

**Fuentes:**
- nginx_manual.pdf
- troubleshooting_guide.pdf

Query Rewriting

The _rewrite_query() method generates diverse search queries:

def _rewrite_query(self, query_text: str, llm) -> list:
    # Generates 5 variations:
    # 1. English with official terminology
    # 2. Spanish with technical terms
    # 3. Specific section/chapter names
    # 4. Keywords/values the answer would contain
    # 5. Alternative interpretation
    
    queries = [
        query_text,  # Original query
        "nginx port binding errors and solutions",
        "errores de enlace de puerto nginx y soluciones",
        "nginx.conf listen directive configuration",
        "port 80 address already in use nginx",
        "how to resolve nginx startup failures"
    ]
    return queries[:6]

Response Synthesis Rules

The LLM is prompted with strict constraints:

rule

Respond ALWAYS in Spanish

rule

Use ONLY information from the provided context

rule

NEVER invent or assume information not in context

rule

Reproduce tables completely and faithfully

rule

Combine information from multiple fragments coherently

rule

If insufficient context, respond: “No encontré información específica sobre eso en los documentos cargados.”

stream_query

def stream_query(self, query_text: str) -> Generator[dict, None, None]

Streaming version of query() for real-time UI updates.

query_text

str

required

Natural language question

yield

dict

Generator yielding event dictionaries with event and data keys

Event Types:

thinking

event

Progress updates during retrieval

message

event

Incremental response tokens

done

event

Signals completion

Example:

from src.core.knowledge import kb

for event in kb.stream_query("How to configure PostgreSQL replication?"):
    if event["event"] == "thinking":
        print(f"[Status] {event['data']}")
    elif event["event"] == "message":
        print(event["data"], end="", flush=True)
    elif event["event"] == "done":
        print("\n[Complete]")

Output:

[Status] Analizando tu pregunta...
[Status] Optimizando búsqueda...
[Status] Consultando base de conocimiento vectorizada...
[Status] Leyendo 5 fragmentos relevantes...
La replicación en PostgreSQL se configura mediante...

**Fuentes:**
- postgresql_manual.pdf
[Complete]

Use stream_query() for chat interfaces to provide instant feedback while processing.

Integration Example

Using the knowledge base in the diagnosis node:

from src.core.knowledge import kb
from src.agent.state import AgentState
from typing import Dict, Any

def diagnose_node(state: AgentState) -> Dict[str, Any]:
    error = state.get("current_error", "")
    
    # Query knowledge base
    rag_context = ""
    if kb:
        rag_context = kb.query(f"How to fix: {error}")
    
    # Use context in LLM prompt
    diagnosis = llm.invoke([
        SystemMessage(content=f"Documentation:\n{rag_context[:1000]}"),
        HumanMessage(content=f"Error: {error}")
    ])
    
    return {
        "current_step": "diagnose",
        "diagnosis_log": [diagnosis.content]
    }

Advanced Configuration

Custom Embedding Model

from llama_index.embeddings.openai import OpenAIEmbedding

kb = VectorKnowledgeBase()
kb.embed_model = OpenAIEmbedding(model="text-embedding-3-large")

Adjust Retrieval Parameters

# Retrieve more chunks
retriever = kb.index.as_retriever(similarity_top_k=10)

# Use different reranking top_n
from llama_index.postprocessor.cohere_rerank import CohereRerank
kb.reranker = CohereRerank(api_key=config.COHERE_API_KEY, top_n=10)

Multi-Language Support

The system automatically handles bilingual queries:

# English query
response_en = kb.query("How to restart nginx?")

# Spanish query
response_es = kb.query("¿Cómo reiniciar nginx?")

# Both return Spanish responses based on English+Spanish docs

Performance Optimization

Lazy Index Loading

The vector index is loaded on first query:

if not self.index:
    self.index = VectorStoreIndex.from_vector_store(
        self.vector_store,
        embed_model=self.embed_model
    )

Caching Strategies

Implement query caching for repeated questions:

from functools import lru_cache

class CachedKnowledgeBase(VectorKnowledgeBase):
    @lru_cache(maxsize=100)
    def query(self, query_text: str) -> str:
        return super().query(query_text)

Error Handling

from src.core.knowledge import kb

try:
    if kb:
        response = kb.query("How to fix database connection?")
    else:
        print("Knowledge base not initialized")
except Exception as e:
    print(f"RAG query failed: {e}")
    # Fallback to LLM without RAG context

Global Singleton

The knowledge base is initialized as a global singleton:

# In src/core/knowledge.py
kb = None

def init_knowledge_base():
    global kb
    try:
        kb = VectorKnowledgeBase()
        log("system", "Base de conocimiento inicializada correctamente.")
    except Exception as e:
        log("error", f"No se pudo inicializar la base de conocimiento: {e}")
        kb = None

Usage:

from src.core.knowledge import init_knowledge_base, kb

# Initialize at startup
init_knowledge_base()

# Use throughout application
if kb:
    kb.query("How to optimize PostgreSQL?")

Required Environment Variables

PINECONE_API_KEY

str

required

Pinecone API key for vector database

PINECONE_INDEX_NAME

str

default:"sentinel-docs"

Name of the Pinecone index

OPENAI_API_KEY

str

required

OpenAI API key for embeddings and LLM

COHERE_API_KEY

str

required

Cohere API key for reranking

LLAMA_CLOUD_API_KEY

str

required

LlamaCloud API key for PDF parsing

EMBEDDING_MODEL

str

default:"text-embedding-ada-002"

OpenAI embedding model name

EMBEDDING_DIM

int

default:"1536"

Embedding vector dimension

MODEL_NAME

str

default:"gpt-4"

LLM model for synthesis

TEMPERATURE

float

default:"0.0"

LLM temperature (0.0 for factual responses)

Agent Nodes

Use knowledge base in diagnose_node

Configuration

Configure RAG parameters

Memory System

Complement RAG with episodic memory

RAG System

Learn about RAG architecture

Endpoints

Python SDK

Overview

Import

VectorKnowledgeBase

Constructor

Example

Methods

ensure_index_exists

ingest_manuals

ingest_file

query

Query Rewriting

Response Synthesis Rules

stream_query

Integration Example

Advanced Configuration

Custom Embedding Model

Adjust Retrieval Parameters

Multi-Language Support

Performance Optimization

Lazy Index Loading

Caching Strategies

Error Handling

Global Singleton

Required Environment Variables

Agent Nodes

Configuration

Memory System

RAG System

Build docs developers (and LLMs) love

Endpoints

Python SDK

​Overview

​Import

​VectorKnowledgeBase

​Constructor

​Example

​Methods

​ensure_index_exists

​ingest_manuals

​ingest_file

​query

​Query Rewriting

​Response Synthesis Rules

​stream_query

​Integration Example

​Advanced Configuration

​Custom Embedding Model

​Adjust Retrieval Parameters

​Multi-Language Support

​Performance Optimization

​Lazy Index Loading

​Caching Strategies

​Error Handling

​Global Singleton

​Required Environment Variables

​Related

Agent Nodes

Configuration

Memory System

RAG System

Build docs developers (and LLMs) love

Overview

Import

VectorKnowledgeBase

Constructor

Example

Methods

ensure_index_exists

ingest_manuals

ingest_file

query

Query Rewriting

Response Synthesis Rules

stream_query

Integration Example

Advanced Configuration

Custom Embedding Model

Adjust Retrieval Parameters

Multi-Language Support

Performance Optimization

Lazy Index Loading

Caching Strategies

Error Handling

Global Singleton

Required Environment Variables

Related