RAG retriever

The RAGRetriever class implements semantic search over a vector store by converting queries into embeddings and retrieving the most similar documents. It supports configurable result counts and similarity score filtering.

Class definition

class RAGRetriever:
    def __init__(self, vector_store, embedding_manager)

Constructor parameters

vector_store

VectorStore

required

An initialized VectorStore instance containing the document embeddings to search against.

embedding_manager

EmbeddingManager

required

An initialized EmbeddingManager instance used to convert query text into embeddings.

Methods

retrieve()

Retrieves the most relevant documents for a given query.

def retrieve(self, query: str, top_k: int = 5, score_threshold: float = 0.0) -> List[Dict]

query

str

required

The search query text to find relevant documents for.

top_k

int

default:"5"

Maximum number of documents to retrieve. The actual number returned may be less if filtered by score_threshold.

score_threshold

float

default:"0.0"

Minimum similarity score (0.0 to 1.0) for documents to be included in results. Documents with scores below this threshold are filtered out.

returns

List[Dict]

List of dictionaries containing retrieved documents and their metadata. Returns an empty list if no documents are found or an error occurs.

Return value structure

Each retrieved document is a dictionary with the following fields:

str

Unique identifier of the document in the vector store

content

str

The full text content of the retrieved document

metadata

dict

Document metadata including:

path: File path in the repository
source: GitHub URL of the file
doc_index: Position in the original batch
content_length: Character count
Other fields from the original document

similarity_score

float

Similarity score between 0.0 and 1.0, where 1.0 is most similar. Calculated as 1 - distance.

distance

float

Raw cosine distance from the query (lower is more similar)

rank

int

Position in the results (1-indexed), indicating relevance ranking

Usage example

from src.rag.rag_retriever import RAGRetriever

# Initialize retriever
rag_retriever = RAGRetriever(
    vector_store=vector_store,
    embedding_manager=embedding_manager
)

# Retrieve relevant documents
query = "How does authentication work?"
results = rag_retriever.retrieve(query, top_k=5)

# Process results
for doc in results:
    print(f"File: {doc['metadata']['path']}")
    print(f"Similarity: {doc['similarity_score']:.3f}")
    print(f"Content preview: {doc['content'][:100]}...\n")

Filtering by similarity score

# Only retrieve highly relevant documents
results = rag_retriever.retrieve(
    query="database connection",
    top_k=10,
    score_threshold=0.7  # Only documents with >70% similarity
)

print(f"Found {len(results)} highly relevant documents")

Integration example

From main.py showing retrieval in the interactive chat loop:

# Initialize RAG retriever
rag_retriever = RAGRetriever(
    vector_store=vector_store,
    embedding_manager=embedding_manager
)

# Initialize LLM
llm = GroqLLM(model_name=model_name)

# Interactive query loop
while True:
    query = input("\nAsk anything ('exit' to quit): ")
    if query.strip().lower() == "exit":
        break
    
    # Use retriever to get context and generate answer
    answer = llm.rag(query=query, retriever=rag_retriever)
    print(answer)

Adjusting retrieval parameters

# Get top 5 documents, no filtering
results = rag_retriever.retrieve(query)

Working with results

results = rag_retriever.retrieve("error handling", top_k=5)

if not results:
    print("No relevant documents found")
else:
    # Sort by similarity (highest first)
    sorted_results = sorted(
        results,
        key=lambda x: x['similarity_score'],
        reverse=True
    )
    
    # Group by file
    from collections import defaultdict
    by_file = defaultdict(list)
    for doc in results:
        file_path = doc['metadata']['path']
        by_file[file_path].append(doc)
    
    print(f"Found relevant content in {len(by_file)} files")

Query embeddings

The retriever generates embeddings for queries using the same model used for document embeddings:

# Inside retrieve() method:
query_embedding = self.embedding_manager.generate_embeddings([query])[0]

This ensures queries and documents are embedded in the same vector space for accurate similarity comparisons.

Error handling

try:
    results = rag_retriever.retrieve(query)
    if not results:
        print("No documents matched your query")
    else:
        print(f"Retrieved {len(results)} documents")
except Exception as e:
    print(f"Retrieval error: {e}")
    results = []  # Fallback to empty results

Understanding similarity scores

Similarity scores are calculated as 1 - cosine_distance, where:

1.0 = Identical or near-identical content
0.7-0.9 = Highly relevant, likely contains answer
0.5-0.7 = Moderately relevant, related concepts
0.3-0.5 = Loosely related
< 0.3 = Likely not relevant

Performance considerations

# For large result sets, consider pagination
def paginated_retrieve(query, page_size=5, page=1):
    all_results = rag_retriever.retrieve(
        query,
        top_k=page_size * page
    )
    start = (page - 1) * page_size
    end = start + page_size
    return all_results[start:end]

Implementation notes

Query embeddings are generated on-the-fly for each retrieval
Uses ChromaDB’s query() method with cosine similarity
Results are automatically sorted by similarity (most relevant first)
Score threshold filtering happens after retrieval to reduce result set
Returns empty list on errors to allow graceful degradation
Thread-safe for concurrent queries (reads only from vector store)

Core Modules

Class definition

Constructor parameters

Methods

retrieve()

Return value structure

Usage example

Filtering by similarity score

Integration example

Adjusting retrieval parameters

Working with results

Query embeddings

Error handling

Understanding similarity scores

Performance considerations

Implementation notes

Build docs developers (and LLMs) love

Core Modules

​Class definition

​Constructor parameters

​Methods

​retrieve()

​Return value structure

​Usage example

​Filtering by similarity score

​Integration example

​Adjusting retrieval parameters

​Working with results

​Query embeddings

​Error handling

​Understanding similarity scores

​Performance considerations

​Implementation notes

Build docs developers (and LLMs) love

Class definition

Constructor parameters

Methods

retrieve()

Return value structure

Usage example

Filtering by similarity score

Integration example

Adjusting retrieval parameters

Working with results

Query embeddings

Error handling

Understanding similarity scores

Performance considerations

Implementation notes