Skip to main content

Overview

The VectorKnowledgeBase class provides a Retrieval-Augmented Generation (RAG) system powered by Pinecone, LlamaIndex, and OpenAI embeddings. It enables the agent to query technical manuals and documentation to inform diagnosis and remediation.

Import

from src.core.knowledge import VectorKnowledgeBase, kb
The global singleton kb is initialized at startup:
from src.core.knowledge import kb

if kb:
    response = kb.query("How to restart nginx?")
    print(response)

VectorKnowledgeBase

Constructor

class VectorKnowledgeBase:
    def __init__(self)
Initializes the knowledge base with Pinecone vector store and OpenAI embeddings.
self.pc
Pinecone
Pinecone client instance using config.PINECONE_API_KEY
self.index_name
str
Pinecone index name from config.PINECONE_INDEX_NAME
self.vector_store
PineconeVectorStore
LlamaIndex wrapper for Pinecone index
self.embed_model
OpenAIEmbedding
Embedding model (text-embedding-ada-002 by default)
self.llm
OpenAI
GPT-4 instance for query synthesis
self.reranker
CohereRerank
Cohere reranker for improving retrieval quality (top_n=5)
self.index
VectorStoreIndex | None
LlamaIndex vector store index (lazy loaded)

Example

from src.core.knowledge import VectorKnowledgeBase

kb = VectorKnowledgeBase()
print(f"Connected to index: {kb.index_name}")
The constructor automatically creates the Pinecone index if it doesn’t exist using ensure_index_exists().

Methods

ensure_index_exists

def ensure_index_exists(self) -> None
Creates the Pinecone index if it doesn’t already exist. Behavior:
  1. Lists existing Pinecone indexes
  2. If index_name not found, creates a new index with:
    • Dimension: config.EMBEDDING_DIM (1536 for Ada-002)
    • Metric: "cosine"
    • Spec: Serverless on AWS us-east-1
Example:
kb = VectorKnowledgeBase()
kb.ensure_index_exists()
# [RAG] Indice existente: sentinel-docs
Index creation can take 1-2 minutes. The method will block until the index is ready.

ingest_manuals

def ingest_manuals(self) -> None
Ingests all PDF manuals from config.MANUALS_DIR using LlamaParse. Behavior:
  1. Initializes LlamaParse for PDF → Markdown conversion
  2. Reads all PDFs from the manuals directory
  3. Parses documents with LlamaParse (extracts tables, formatting)
  4. Splits documents into chunks (1024 tokens, 200 overlap)
  5. Generates embeddings and stores in Pinecone
Example:
from src.core.knowledge import VectorKnowledgeBase

kb = VectorKnowledgeBase()
kb.ingest_manuals()
# [RAG] Iniciando ingesta de manuales con LlamaParse...
# [RAG] 47 documentos cargados. Indexando en Pinecone...
# [RAG] Ingesta completada.
Configuration:
config.MANUALS_DIR
str
default:"./manuals"
Directory containing PDF documentation files
config.LLAMA_CLOUD_API_KEY
str
required
API key for LlamaParse service
Run ingest_manuals() once during initial setup or when documentation is updated. The vector embeddings persist in Pinecone.

ingest_file

def ingest_file(self, filepath: str) -> None
Ingests a single file (PDF or text) into the knowledge base.
filepath
str
required
Absolute path to the file to ingest
Behavior:
  1. Detects file type (PDF uses LlamaParse, others use SimpleDirectoryReader)
  2. Loads and parses the document
  3. Splits into chunks using SentenceSplitter
  4. Inserts nodes into the existing vector index
Example:
from src.core.knowledge import kb

kb.ingest_file("/path/to/new_manual.pdf")
# [RAG] Ingesta incremental: new_manual.pdf
# [RAG] 12 fragmentos cargados desde new_manual.pdf.
# [RAG] 12 nodos insertados en Pinecone.
Chunking Parameters:
chunk_size
int
default:"1024"
Maximum tokens per chunk
chunk_overlap
int
default:"200"
Overlapping tokens between chunks for context preservation
Use ingest_file() to incrementally add new documentation without re-indexing the entire corpus.

query

def query(self, query_text: str) -> str
Performs a RAG query and returns a synthesized answer.
query_text
str
required
Natural language question in English or Spanish
return
str
Synthesized answer in Spanish with sources appended
Behavior:
  1. Query Rewriting: Generates 5 variations using _rewrite_query()
  2. Retrieval: Fetches top 5 chunks for each query (similarity search)
  3. Deduplication: Removes duplicate nodes by ID
  4. Reranking: Uses Cohere to select the 5 most relevant chunks
  5. Synthesis: Generates answer using GPT-4 with strict rules
  6. Source Attribution: Appends source file names
Example:
from src.core.knowledge import kb

response = kb.query("How do I fix nginx port conflict?")
print(response)
Output:
Un conflicto de puerto en nginx generalmente ocurre cuando otro proceso está usando el puerto 80 o 443.

Para solucionarlo:
1. Identifica el proceso: `sudo ss -tulpn | grep :80`
2. Detén nginx: `sudo service nginx stop`
3. Mata el proceso conflictivo: `sudo kill -9 <PID>`
4. Reinicia nginx: `sudo service nginx start`

**Fuentes:**
- nginx_manual.pdf
- troubleshooting_guide.pdf

Query Rewriting

The _rewrite_query() method generates diverse search queries:
def _rewrite_query(self, query_text: str, llm) -> list:
    # Generates 5 variations:
    # 1. English with official terminology
    # 2. Spanish with technical terms
    # 3. Specific section/chapter names
    # 4. Keywords/values the answer would contain
    # 5. Alternative interpretation
    
    queries = [
        query_text,  # Original query
        "nginx port binding errors and solutions",
        "errores de enlace de puerto nginx y soluciones",
        "nginx.conf listen directive configuration",
        "port 80 address already in use nginx",
        "how to resolve nginx startup failures"
    ]
    return queries[:6]

Response Synthesis Rules

The LLM is prompted with strict constraints:
1
rule
Respond ALWAYS in Spanish
2
rule
Use ONLY information from the provided context
3
rule
NEVER invent or assume information not in context
4
rule
Reproduce tables completely and faithfully
5
rule
Combine information from multiple fragments coherently
6
rule
If insufficient context, respond: “No encontré información específica sobre eso en los documentos cargados.”

stream_query

def stream_query(self, query_text: str) -> Generator[dict, None, None]
Streaming version of query() for real-time UI updates.
query_text
str
required
Natural language question
yield
dict
Generator yielding event dictionaries with event and data keys
Event Types:
thinking
event
Progress updates during retrieval
message
event
Incremental response tokens
done
event
Signals completion
Example:
from src.core.knowledge import kb

for event in kb.stream_query("How to configure PostgreSQL replication?"):
    if event["event"] == "thinking":
        print(f"[Status] {event['data']}")
    elif event["event"] == "message":
        print(event["data"], end="", flush=True)
    elif event["event"] == "done":
        print("\n[Complete]")
Output:
[Status] Analizando tu pregunta...
[Status] Optimizando búsqueda...
[Status] Consultando base de conocimiento vectorizada...
[Status] Leyendo 5 fragmentos relevantes...
La replicación en PostgreSQL se configura mediante...

**Fuentes:**
- postgresql_manual.pdf
[Complete]
Use stream_query() for chat interfaces to provide instant feedback while processing.

Integration Example

Using the knowledge base in the diagnosis node:
from src.core.knowledge import kb
from src.agent.state import AgentState
from typing import Dict, Any

def diagnose_node(state: AgentState) -> Dict[str, Any]:
    error = state.get("current_error", "")
    
    # Query knowledge base
    rag_context = ""
    if kb:
        rag_context = kb.query(f"How to fix: {error}")
    
    # Use context in LLM prompt
    diagnosis = llm.invoke([
        SystemMessage(content=f"Documentation:\n{rag_context[:1000]}"),
        HumanMessage(content=f"Error: {error}")
    ])
    
    return {
        "current_step": "diagnose",
        "diagnosis_log": [diagnosis.content]
    }

Advanced Configuration

Custom Embedding Model

from llama_index.embeddings.openai import OpenAIEmbedding

kb = VectorKnowledgeBase()
kb.embed_model = OpenAIEmbedding(model="text-embedding-3-large")

Adjust Retrieval Parameters

# Retrieve more chunks
retriever = kb.index.as_retriever(similarity_top_k=10)

# Use different reranking top_n
from llama_index.postprocessor.cohere_rerank import CohereRerank
kb.reranker = CohereRerank(api_key=config.COHERE_API_KEY, top_n=10)

Multi-Language Support

The system automatically handles bilingual queries:
# English query
response_en = kb.query("How to restart nginx?")

# Spanish query
response_es = kb.query("¿Cómo reiniciar nginx?")

# Both return Spanish responses based on English+Spanish docs

Performance Optimization

Lazy Index Loading

The vector index is loaded on first query:
if not self.index:
    self.index = VectorStoreIndex.from_vector_store(
        self.vector_store,
        embed_model=self.embed_model
    )

Caching Strategies

Implement query caching for repeated questions:
from functools import lru_cache

class CachedKnowledgeBase(VectorKnowledgeBase):
    @lru_cache(maxsize=100)
    def query(self, query_text: str) -> str:
        return super().query(query_text)

Error Handling

from src.core.knowledge import kb

try:
    if kb:
        response = kb.query("How to fix database connection?")
    else:
        print("Knowledge base not initialized")
except Exception as e:
    print(f"RAG query failed: {e}")
    # Fallback to LLM without RAG context

Global Singleton

The knowledge base is initialized as a global singleton:
# In src/core/knowledge.py
kb = None

def init_knowledge_base():
    global kb
    try:
        kb = VectorKnowledgeBase()
        log("system", "Base de conocimiento inicializada correctamente.")
    except Exception as e:
        log("error", f"No se pudo inicializar la base de conocimiento: {e}")
        kb = None
Usage:
from src.core.knowledge import init_knowledge_base, kb

# Initialize at startup
init_knowledge_base()

# Use throughout application
if kb:
    kb.query("How to optimize PostgreSQL?")

Required Environment Variables

PINECONE_API_KEY
str
required
Pinecone API key for vector database
PINECONE_INDEX_NAME
str
default:"sentinel-docs"
Name of the Pinecone index
OPENAI_API_KEY
str
required
OpenAI API key for embeddings and LLM
COHERE_API_KEY
str
required
Cohere API key for reranking
LLAMA_CLOUD_API_KEY
str
required
LlamaCloud API key for PDF parsing
EMBEDDING_MODEL
str
default:"text-embedding-ada-002"
OpenAI embedding model name
EMBEDDING_DIM
int
default:"1536"
Embedding vector dimension
MODEL_NAME
str
default:"gpt-4"
LLM model for synthesis
TEMPERATURE
float
default:"0.0"
LLM temperature (0.0 for factual responses)

Agent Nodes

Use knowledge base in diagnose_node

Configuration

Configure RAG parameters

Memory System

Complement RAG with episodic memory

RAG System

Learn about RAG architecture

Build docs developers (and LLMs) love