Overview
NVEmbedV2EmbeddingModel provides access to NVIDIA’s NV-Embed-v2 model, a high-performance embedding model with 4096-dimensional vectors. The model supports multi-GPU deployment and handles long contexts up to 32,768 tokens.
Class Definition
src/remem/embedding_model/NVEmbedV2.py:56
Initialization
__init__
Global configuration object containing:
embedding_return_as_normalized: Whether to normalize embeddingsembedding_max_seq_len: Maximum sequence length (default: 32768)embedding_batch_size: Batch size for encoding
Model name/path. If provided, overrides the name from global_config.
Typically “nvidia/NV-Embed-v2”
Attributes
The loaded HuggingFace transformer model
Embedding dimension (4096 for NV-Embed-v2)
Configuration containing:
embedding_model_name: Model identifiernorm: Whether to normalize embeddingsmodel_init_params: Parameters for model loadingencode_params: Default encoding parameters
Methods
batch_encode
Text strings to encode. Can be a single string or list of strings.
Optional task instruction. Will be formatted as:
"Instruct: {instruction}\nQuery: "Maximum sequence length for tokenization
Number of texts to process in each batch
Number of worker threads for processing
2D numpy array of shape (n_texts, 4096). Normalized if
embedding_return_as_normalized is True.GPU Memory Management
The model automatically distributes across available GPUs using theget_max_memory utility:
src/remem/embedding_model/NVEmbedV2.py:21
Behavior:
- Detects GPUs from
CUDA_VISIBLE_DEVICESenvironment variable - Skips GPUs with usage above threshold (default 10%)
- Allocates free memory on available GPUs
- Automatically balances model across multiple GPUs
Configuration Details
The model initializes with the following default configuration:Performance Considerations
NV-Embed-v2 is a large model that requires significant GPU memory. For multi-GPU setups, ensure GPUs are relatively idle before loading the model.
- Batch Size: Increase for higher throughput on large GPUs
- Multi-GPU: The model automatically uses available GPUs
- Long Contexts: Supports up to 32,768 tokens
- Normalization: Enabled by default for cosine similarity
See Also
- BaseEmbeddingModel - Base interface
- OpenAI Embeddings - Alternative API-based option
- Model on HuggingFace - Official model card