Overview
Remem provides two OpenAI embedding clients:
OpenAIEmbeddingModel: Basic client for OpenAI and compatible APIs
CacheOpenAIEmbeddingModel: Client with SQLite caching for cost reduction
These clients support OpenAI’s embedding models, Azure OpenAI, and any OpenAI-compatible local servers.
OpenAIEmbeddingModel
from remem.embedding_model.openai_embedding_client import OpenAIEmbeddingModel
Defined in: src/remem/embedding_model/openai_embedding_client.py:90
Initialization
def __init__(
self,
global_config: Optional[BaseConfig] = None,
embedding_model_name: Optional[str] = None,
api_key: Optional[str] = None,
base_url: Optional[str] = "https://api.openai.com/v1/embeddings",
max_retries: int = 3,
**kwargs
) -> None
Parameters:
Global configuration object
Model name (e.g., “text-embedding-3-large”, “text-embedding-3-small”)
API key. Falls back to OPENAI_API_KEY environment variable.
For local servers, defaults to “not-needed-for-local-server”
base_url
str
default:"https://api.openai.com/v1/embeddings"
API endpoint URL. Change for local/custom servers
Number of retry attempts for failed requests
Use Azure OpenAI instead of standard OpenAI. Requires environment variables:
AZURE_OPENAI_API_KEY or AZURE_OPENAI_AD_TOKEN
OPENAI_API_VERSION
AZURE_OPENAI_ENDPOINT
Examples
OpenAI Official API
import os
from remem.embedding_model.openai_embedding_client import OpenAIEmbeddingModel
os.environ["OPENAI_API_KEY"] = "sk-..."
model = OpenAIEmbeddingModel(
embedding_model_name="text-embedding-3-large",
base_url="https://api.openai.com/v1/"
)
embs = model.batch_encode(["Hello, world!"])
print(embs.shape) # (1, 3072)
Azure OpenAI
import os
from remem.embedding_model.openai_embedding_client import OpenAIEmbeddingModel
os.environ["AZURE_OPENAI_API_KEY"] = "..."
os.environ["OPENAI_API_VERSION"] = "2024-02-01"
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://your-resource.openai.azure.com/"
model = OpenAIEmbeddingModel(
embedding_model_name="text-embedding-3-large",
use_azure=True
)
embs = model.batch_encode(["Hello, Azure!"])
Local OpenAI-Compatible Server
from remem.embedding_model.openai_embedding_client import OpenAIEmbeddingModel
# For local servers like vLLM, FastAPI, etc.
model = OpenAIEmbeddingModel(
embedding_model_name="custom-model",
base_url="http://localhost:8001/v1/",
api_key="not-needed" # Optional for local servers
)
embs = model.batch_encode(["Local embedding"])
Methods
batch_encode
def batch_encode(self, texts: List[str], **kwargs) -> np.ndarray
Encodes text into embeddings with automatic batching and retry logic.
Parameters:
Optional instruction prefix. Will be formatted as:
"{instruction}<|endofprefix|>{text}"
Number of texts per API request
Returns:
2D array of shape (n_texts, embedding_dim).
Dimensions: 3072 for text-embedding-3-large, 1536 for text-embedding-3-small
Error Handling:
- Content filtering (422 errors): Automatically creates zero-vector fallbacks
- Network errors: Retries with exponential backoff (up to
max_retries)
- Rate limits: Handled by retry logic with jitter
Example:
texts = ["First text", "Second text", "Third text"]
# Basic usage
embs = model.batch_encode(texts)
# With instruction
query_embs = model.batch_encode(
["search query"],
instruction="Represent this query for search"
)
# Large batch with custom batch size
large_batch = [f"Document {i}" for i in range(1000)]
embs = model.batch_encode(large_batch, batch_size=50)
encode
def encode(self, texts: List[str], **kwargs) -> np.ndarray
Low-level encoding method without batching. Used internally by batch_encode.
CacheOpenAIEmbeddingModel
from remem.embedding_model.openai_embedding_client import CacheOpenAIEmbeddingModel
Defined in: src/remem/embedding_model/openai_embedding_client.py:374
Extends OpenAIEmbeddingModel with SQLite-based caching to reduce API calls and costs.
Initialization
def __init__(
self,
cache_filename: Optional[str] = None,
global_config: Optional[BaseConfig] = None,
embedding_model_name: Optional[str] = None,
api_key: Optional[str] = None,
base_url: Optional[str] = None,
max_retries: int = 5,
**kwargs
) -> None
Parameters:
Same as OpenAIEmbeddingModel, plus:
Name of SQLite cache file. Defaults to:
"{model_name}_embedding_cache.sqlite"Stored in: outputs/{dataset}/embedding_cache/
Cache Behavior
Cache Key: Based on hash of:
- Text content
- Model name
- Instruction
- Max length parameter
Cache Hit: Returns embedding from SQLite database
Cache Miss: Calls API, stores result in cache
Example
from remem.utils.config_utils import BaseConfig
from remem.embedding_model.openai_embedding_client import CacheOpenAIEmbeddingModel
config = BaseConfig()
config.dataset = "my_dataset"
model = CacheOpenAIEmbeddingModel(
global_config=config,
embedding_model_name="text-embedding-3-large",
base_url="https://api.openai.com/v1/"
)
# First call: API request (cache miss)
texts = ["Machine learning", "Deep learning"]
embs1 = model.batch_encode(texts)
print("Cache stats: 0 hits, 2 misses")
# Second call: No API request (cache hit)
embs2 = model.batch_encode(texts)
print("Cache stats: 2 hits, 0 misses")
assert np.allclose(embs1, embs2) # Same embeddings
Cache Location
outputs/
{dataset}/
embedding_cache/
text-embedding-3-large_embedding_cache.sqlite
text-embedding-3-large_embedding_cache.sqlite.lock
Supported Models
OpenAI Models
Most capable embedding model (March 2024)
Faster and cheaper than large version
Legacy model (still supported)
Custom Models
Any OpenAI-compatible server can be used by specifying:
- Custom
base_url
- Model-specific
embedding_model_name
Retry Logic
The client uses exponential backoff with jitter for retries:
# Retry parameters (in _make_http_request_with_retry)
base_delay = 1 # Initial delay: 1 second
factor = 2 # Doubles each retry: 1s, 2s, 4s, 8s, 16s
max_delay = 60 # Capped at 60 seconds
jitter = random # Random jitter to prevent thundering herd
Retryable Errors:
- Network timeouts
- Connection errors
- HTTP 5xx errors
- Rate limit errors
Non-Retryable Errors:
- HTTP 422 (content validation) - Creates fallback embedding instead
- Authentication errors
Factory Function
The module provides a factory function for automatic client creation:
from remem.embedding_model import _get_embedding_client
# Auto-selects OpenAI client for text-embedding models
client = _get_embedding_client(
global_config=config,
embedding_model_name="text-embedding-3-large",
openai_style_server=True
)
Defined in: src/remem/embedding_model/__init__.py:4
See Also