Overview
BaseEmbeddingModel is the abstract base class that defines the interface for all embedding models in Remem. It provides a consistent API for encoding text into vector embeddings and computing query-document similarity scores.
Class Definition
src/remem/embedding_model/base.py:178
Attributes
Global configuration object containing system-wide settings
Name of the embedding model (e.g., “nvidia/NV-Embed-v2”, “text-embedding-3-large”)
Model-specific configuration parameters
Dimensionality of the embedding vectors (set by subclass)
Methods
__init__
Global configuration object. If None, uses default BaseConfig instance.
batch_encode
List of text strings to encode
Additional model-specific parameters:
instruction: Optional instruction prefix for the embeddingsbatch_size: Number of texts to process at oncemax_length: Maximum sequence length
2D numpy array of shape (n_texts, embedding_dim)
NotImplementedError: This method must be implemented by subclasses
get_query_doc_scores
Query embedding vector of shape (embedding_dim,)
Document embedding matrix of shape (n_docs, embedding_dim)
Array of similarity scores of shape (n_docs,)
EmbeddingConfig
EmbeddingConfig is a flexible configuration class that stores model-specific parameters.
Defined in: src/remem/embedding_model/base.py:14
Methods
from_dict
Dictionary containing configuration parameters
New EmbeddingConfig instance
to_dict
Dictionary representation of the configuration
batch_upsert
Dictionary of parameters to update or add
Caching Utilities
make_cache_embed
src/remem/embedding_model/base.py:103
Parameters:
The encoding function to wrap
Path to SQLite cache database file
Device to place cached embeddings on (e.g., “cuda”, “cpu”)
Wrapped function that uses caching
See Also
- NVEmbedV2EmbeddingModel - NVIDIA NV-Embed-v2 implementation
- OpenAIEmbeddingModel - OpenAI and compatible API clients
- GritLMEmbeddingModel - GritLM embedding implementation