Overview
GritLMEmbeddingModel provides access to GritLM (Generalized Representation Instruction Tuned Language Model), a unified model that can both generate embeddings and perform text generation tasks.
Class Definition
src/remem/embedding_model/GritLM.py:17
Initialization
__init__
Global configuration object containing:
embedding_return_as_cpu: Return embeddings on CPUembedding_return_as_numpy: Convert to numpy arraysembedding_return_as_normalized: Normalize embeddingsembedding_batch_size: Batch size for encoding
Model name/path containing “GritLM” (e.g., “GritLM/GritLM-7B”)
If provided, overrides the name from global_config
Attributes
The loaded GritLM model instance
Embedding dimension (depends on model variant)
Device where the model is loaded (e.g., cuda:0, cpu)
Configuration containing:
embedding_model_name: Model identifierreturn_cpu: Whether to return CPU tensorsreturn_numpy: Whether to convert to numpynorm: Whether to normalize embeddingsmodel_init_params: Model initialization parametersencode_params: Default encoding parameters
Methods
batch_encode
Text strings to encode. Can be a single string or list of strings.
Optional task instruction. Will be formatted as:
"<|user|>\n{instruction}\n<|embed|>\n" if provided,
or "<|embed|>\n" if emptyNumber of texts to process in each batch
2D numpy array of shape (n_texts, embedding_dim).
Normalized if
embedding_return_as_normalized is True.batch_generate
List of chat messages for generation
GritLM Instruction Format
GritLM uses special tokens to distinguish between embedding and generation modes:Embedding Mode
Example Instructions
Query Encoding:Configuration Details
The model initializes with the following default configuration:Model Variants
GritLM comes in different sizes:Base 7B model, good balance of quality and speed
Mixture of experts model for higher quality
Unified Embedding and Generation
GritLM’s key feature is its ability to handle both tasks with a single model:Performance Considerations
GritLM models are larger than specialized embedding models. Ensure adequate GPU memory.
- GritLM-7B: ~14GB VRAM (fp16)
- GritLM-8x7B: ~90GB VRAM (fp16)
- Device Map: Auto-placement across GPUs with
device_map="auto" - Batch Size: Adjust based on available memory
- Dtype: Use
torch_dtype="auto"for automatic precision selection - Instructions: Use consistent instructions for query/doc pairs
Use Cases
Retrieval-Augmented Generation (RAG):See Also
- BaseEmbeddingModel - Base interface
- NVIDIA Embeddings - Specialized embedding model
- GritLM Paper - Original research
- GritLM on HuggingFace - Model hub