RAGRetriever class implements semantic search over a vector store by converting queries into embeddings and retrieving the most similar documents. It supports configurable result counts and similarity score filtering.
Class definition
Constructor parameters
An initialized
VectorStore instance containing the document embeddings to search against.An initialized
EmbeddingManager instance used to convert query text into embeddings.Methods
retrieve()
Retrieves the most relevant documents for a given query.The search query text to find relevant documents for.
Maximum number of documents to retrieve. The actual number returned may be less if filtered by
score_threshold.Minimum similarity score (0.0 to 1.0) for documents to be included in results. Documents with scores below this threshold are filtered out.
List of dictionaries containing retrieved documents and their metadata. Returns an empty list if no documents are found or an error occurs.
Return value structure
Each retrieved document is a dictionary with the following fields:Unique identifier of the document in the vector store
The full text content of the retrieved document
Document metadata including:
path: File path in the repositorysource: GitHub URL of the filedoc_index: Position in the original batchcontent_length: Character count- Other fields from the original document
Similarity score between 0.0 and 1.0, where 1.0 is most similar. Calculated as
1 - distance.Raw cosine distance from the query (lower is more similar)
Position in the results (1-indexed), indicating relevance ranking
Usage example
Filtering by similarity score
Integration example
Frommain.py showing retrieval in the interactive chat loop:
Adjusting retrieval parameters
Working with results
Query embeddings
The retriever generates embeddings for queries using the same model used for document embeddings:Error handling
Understanding similarity scores
Similarity scores are calculated as
1 - cosine_distance, where:- 1.0 = Identical or near-identical content
- 0.7-0.9 = Highly relevant, likely contains answer
- 0.5-0.7 = Moderately relevant, related concepts
- 0.3-0.5 = Loosely related
- < 0.3 = Likely not relevant
Performance considerations
Implementation notes
- Query embeddings are generated on-the-fly for each retrieval
- Uses ChromaDB’s
query()method with cosine similarity - Results are automatically sorted by similarity (most relevant first)
- Score threshold filtering happens after retrieval to reduce result set
- Returns empty list on errors to allow graceful degradation
- Thread-safe for concurrent queries (reads only from vector store)