Embedding manager

The EmbeddingManager class handles the generation of vector embeddings for text using pre-trained Sentence Transformer models. It provides a simple interface for converting text into dense vector representations suitable for semantic search.

Class definition

class EmbeddingManager:
    def __init__(self, model_name="all-MiniLM-L6-v2")

Constructor parameters

model_name

str

default:"all-MiniLM-L6-v2"

Name of the Sentence Transformer model to use for generating embeddings. The default all-MiniLM-L6-v2 is a lightweight, efficient model that produces 384-dimensional embeddings.

The model is automatically downloaded from Hugging Face on first use and cached locally for subsequent runs. The all-MiniLM-L6-v2 model offers a good balance of speed and quality for code embeddings.

Methods

generate_embeddings()

Generates vector embeddings for a list of text strings.

def generate_embeddings(self, texts: List[str]) -> numpy.ndarray

texts

List[str]

required

List of text strings to generate embeddings for. Each string is encoded independently.

returns

numpy.ndarray

NumPy array of embeddings with shape (len(texts), embedding_dimension). For the default model, the embedding dimension is 384.

A progress bar is displayed during embedding generation to track progress for large batches.

_load_model()

Internal method that loads the Sentence Transformer model.

def _load_model(self) -> None

This method is automatically called during initialization. It:

Downloads the model if not cached
Loads the model into memory
Prints the embedding dimension for verification
Raises an exception if the model fails to load

Usage example

from src.rag.embedding_manager import EmbeddingManager

# Initialize with default model
embedding_manager = EmbeddingManager()

# Generate embeddings for text chunks
texts = [
    "def hello_world(): print('Hello, World!')",
    "class MyClass: pass",
    "import numpy as np"
]

embeddings = embedding_manager.generate_embeddings(texts)

print(f"Generated embeddings shape: {embeddings.shape}")
# Output: Generated embeddings shape: (3, 384)

Alternative models

# Fast and efficient, 384 dimensions
embedding_manager = EmbeddingManager()

Integration example

From main.py showing embedding generation in the RAG pipeline:

# Initialize embedding manager
embedding_manager = EmbeddingManager()

# Split documents into chunks
chunks = TextSplitter(docs).split_documents_into_chunks()

# Extract text content from chunks
texts = [doc.page_content for doc in chunks]

# Generate embeddings for all chunks
embeddings = embedding_manager.generate_embeddings(texts)

# Store embeddings in vector database
vector_store.add_documents(chunks, embeddings)

Batch processing

# Process large document sets in batches
all_embeddings = []
batch_size = 100

for i in range(0, len(texts), batch_size):
    batch = texts[i:i + batch_size]
    batch_embeddings = embedding_manager.generate_embeddings(batch)
    all_embeddings.append(batch_embeddings)

# Combine all batches
import numpy as np
final_embeddings = np.vstack(all_embeddings)

Model information

Model

all-MiniLM-L6-v2

Dimensions: 384
Max sequence length: 256 word pieces (~200 words)
Performance: ~14,000 sentences/second on CPU
Size: ~80 MB
Training: Trained on 1 billion sentence pairs

Error handling

try:
    embedding_manager = EmbeddingManager(model_name="invalid-model")
except Exception as e:
    print(f"Failed to load model: {e}")
    # Fallback to default model
    embedding_manager = EmbeddingManager()

Implementation notes

Uses the sentence-transformers library for embedding generation
Models are automatically cached in ~/.cache/torch/sentence_transformers/
Embeddings are generated on CPU by default (GPU acceleration available if PyTorch with CUDA is installed)
Progress bars are displayed for long-running embedding generation
Thread-safe for concurrent embedding generation

Core Modules

Embedding manager

Class definition

Constructor parameters

Methods

generate_embeddings()

_load_model()

Usage example

Alternative models

Integration example

Batch processing

Model information

Error handling

Implementation notes

Build docs developers (and LLMs) love

Core Modules

​Class definition

​Constructor parameters

​Methods

​generate_embeddings()

​_load_model()

​Usage example

​Alternative models

​Integration example

​Batch processing

​Model information

​Error handling

​Implementation notes

Build docs developers (and LLMs) love

Class definition

Constructor parameters

Methods

generate_embeddings()

_load_model()

Usage example

Alternative models

Integration example

Batch processing

Model information

Error handling

Implementation notes