Skip to main content

Overview

Remem provides two OpenAI embedding clients:
  • OpenAIEmbeddingModel: Basic client for OpenAI and compatible APIs
  • CacheOpenAIEmbeddingModel: Client with SQLite caching for cost reduction
These clients support OpenAI’s embedding models, Azure OpenAI, and any OpenAI-compatible local servers.

OpenAIEmbeddingModel

from remem.embedding_model.openai_embedding_client import OpenAIEmbeddingModel
Defined in: src/remem/embedding_model/openai_embedding_client.py:90

Initialization

def __init__(
    self,
    global_config: Optional[BaseConfig] = None,
    embedding_model_name: Optional[str] = None,
    api_key: Optional[str] = None,
    base_url: Optional[str] = "https://api.openai.com/v1/embeddings",
    max_retries: int = 3,
    **kwargs
) -> None
Parameters:
global_config
BaseConfig
default:"None"
Global configuration object
embedding_model_name
str
default:"None"
Model name (e.g., “text-embedding-3-large”, “text-embedding-3-small”)
api_key
str
default:"None"
API key. Falls back to OPENAI_API_KEY environment variable. For local servers, defaults to “not-needed-for-local-server”
base_url
str
default:"https://api.openai.com/v1/embeddings"
API endpoint URL. Change for local/custom servers
max_retries
int
default:"3"
Number of retry attempts for failed requests
use_azure
bool
default:"False"
Use Azure OpenAI instead of standard OpenAI. Requires environment variables:
  • AZURE_OPENAI_API_KEY or AZURE_OPENAI_AD_TOKEN
  • OPENAI_API_VERSION
  • AZURE_OPENAI_ENDPOINT

Examples

OpenAI Official API

import os
from remem.embedding_model.openai_embedding_client import OpenAIEmbeddingModel

os.environ["OPENAI_API_KEY"] = "sk-..."

model = OpenAIEmbeddingModel(
    embedding_model_name="text-embedding-3-large",
    base_url="https://api.openai.com/v1/"
)

embs = model.batch_encode(["Hello, world!"])
print(embs.shape)  # (1, 3072)

Azure OpenAI

import os
from remem.embedding_model.openai_embedding_client import OpenAIEmbeddingModel

os.environ["AZURE_OPENAI_API_KEY"] = "..."
os.environ["OPENAI_API_VERSION"] = "2024-02-01"
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://your-resource.openai.azure.com/"

model = OpenAIEmbeddingModel(
    embedding_model_name="text-embedding-3-large",
    use_azure=True
)

embs = model.batch_encode(["Hello, Azure!"])

Local OpenAI-Compatible Server

from remem.embedding_model.openai_embedding_client import OpenAIEmbeddingModel

# For local servers like vLLM, FastAPI, etc.
model = OpenAIEmbeddingModel(
    embedding_model_name="custom-model",
    base_url="http://localhost:8001/v1/",
    api_key="not-needed"  # Optional for local servers
)

embs = model.batch_encode(["Local embedding"])

Methods

batch_encode

def batch_encode(self, texts: List[str], **kwargs) -> np.ndarray
Encodes text into embeddings with automatic batching and retry logic. Parameters:
texts
List[str] | str
required
Text strings to encode
instruction
str
default:"''"
Optional instruction prefix. Will be formatted as: "{instruction}<|endofprefix|>{text}"
batch_size
int
default:"16"
Number of texts per API request
Returns:
embeddings
np.ndarray
2D array of shape (n_texts, embedding_dim). Dimensions: 3072 for text-embedding-3-large, 1536 for text-embedding-3-small
Error Handling:
  • Content filtering (422 errors): Automatically creates zero-vector fallbacks
  • Network errors: Retries with exponential backoff (up to max_retries)
  • Rate limits: Handled by retry logic with jitter
Example:
texts = ["First text", "Second text", "Third text"]

# Basic usage
embs = model.batch_encode(texts)

# With instruction
query_embs = model.batch_encode(
    ["search query"],
    instruction="Represent this query for search"
)

# Large batch with custom batch size
large_batch = [f"Document {i}" for i in range(1000)]
embs = model.batch_encode(large_batch, batch_size=50)

encode

def encode(self, texts: List[str], **kwargs) -> np.ndarray
Low-level encoding method without batching. Used internally by batch_encode.

CacheOpenAIEmbeddingModel

from remem.embedding_model.openai_embedding_client import CacheOpenAIEmbeddingModel
Defined in: src/remem/embedding_model/openai_embedding_client.py:374 Extends OpenAIEmbeddingModel with SQLite-based caching to reduce API calls and costs.

Initialization

def __init__(
    self,
    cache_filename: Optional[str] = None,
    global_config: Optional[BaseConfig] = None,
    embedding_model_name: Optional[str] = None,
    api_key: Optional[str] = None,
    base_url: Optional[str] = None,
    max_retries: int = 5,
    **kwargs
) -> None
Parameters: Same as OpenAIEmbeddingModel, plus:
cache_filename
str
default:"None"
Name of SQLite cache file. Defaults to: "{model_name}_embedding_cache.sqlite"Stored in: outputs/{dataset}/embedding_cache/

Cache Behavior

Cache Key: Based on hash of:
  • Text content
  • Model name
  • Instruction
  • Max length parameter
Cache Hit: Returns embedding from SQLite database Cache Miss: Calls API, stores result in cache

Example

from remem.utils.config_utils import BaseConfig
from remem.embedding_model.openai_embedding_client import CacheOpenAIEmbeddingModel

config = BaseConfig()
config.dataset = "my_dataset"

model = CacheOpenAIEmbeddingModel(
    global_config=config,
    embedding_model_name="text-embedding-3-large",
    base_url="https://api.openai.com/v1/"
)

# First call: API request (cache miss)
texts = ["Machine learning", "Deep learning"]
embs1 = model.batch_encode(texts)
print("Cache stats: 0 hits, 2 misses")

# Second call: No API request (cache hit)
embs2 = model.batch_encode(texts)
print("Cache stats: 2 hits, 0 misses")

assert np.allclose(embs1, embs2)  # Same embeddings

Cache Location

outputs/
  {dataset}/
    embedding_cache/
      text-embedding-3-large_embedding_cache.sqlite
      text-embedding-3-large_embedding_cache.sqlite.lock

Supported Models

OpenAI Models

text-embedding-3-large
3072 dims
Most capable embedding model (March 2024)
text-embedding-3-small
1536 dims
Faster and cheaper than large version
text-embedding-ada-002
1536 dims
Legacy model (still supported)

Custom Models

Any OpenAI-compatible server can be used by specifying:
  • Custom base_url
  • Model-specific embedding_model_name

Retry Logic

The client uses exponential backoff with jitter for retries:
# Retry parameters (in _make_http_request_with_retry)
base_delay = 1      # Initial delay: 1 second
factor = 2          # Doubles each retry: 1s, 2s, 4s, 8s, 16s
max_delay = 60      # Capped at 60 seconds
jitter = random     # Random jitter to prevent thundering herd
Retryable Errors:
  • Network timeouts
  • Connection errors
  • HTTP 5xx errors
  • Rate limit errors
Non-Retryable Errors:
  • HTTP 422 (content validation) - Creates fallback embedding instead
  • Authentication errors

Factory Function

The module provides a factory function for automatic client creation:
from remem.embedding_model import _get_embedding_client

# Auto-selects OpenAI client for text-embedding models
client = _get_embedding_client(
    global_config=config,
    embedding_model_name="text-embedding-3-large",
    openai_style_server=True
)
Defined in: src/remem/embedding_model/__init__.py:4

See Also

Build docs developers (and LLMs) love