OpenAI Embedding Clients

Overview

Remem provides two OpenAI embedding clients:

OpenAIEmbeddingModel: Basic client for OpenAI and compatible APIs
CacheOpenAIEmbeddingModel: Client with SQLite caching for cost reduction

These clients support OpenAI’s embedding models, Azure OpenAI, and any OpenAI-compatible local servers.

OpenAIEmbeddingModel

from remem.embedding_model.openai_embedding_client import OpenAIEmbeddingModel

Defined in: src/remem/embedding_model/openai_embedding_client.py:90

Initialization

def __init__(
    self,
    global_config: Optional[BaseConfig] = None,
    embedding_model_name: Optional[str] = None,
    api_key: Optional[str] = None,
    base_url: Optional[str] = "https://api.openai.com/v1/embeddings",
    max_retries: int = 3,
    **kwargs
) -> None

Parameters:

global_config

BaseConfig

default:"None"

Global configuration object

embedding_model_name

str

default:"None"

Model name (e.g., “text-embedding-3-large”, “text-embedding-3-small”)

api_key

str

default:"None"

API key. Falls back to OPENAI_API_KEY environment variable. For local servers, defaults to “not-needed-for-local-server”

base_url

str

default:"https://api.openai.com/v1/embeddings"

API endpoint URL. Change for local/custom servers

max_retries

int

default:"3"

Number of retry attempts for failed requests

use_azure

bool

default:"False"

Use Azure OpenAI instead of standard OpenAI. Requires environment variables:

AZURE_OPENAI_API_KEY or AZURE_OPENAI_AD_TOKEN
OPENAI_API_VERSION
AZURE_OPENAI_ENDPOINT

Examples

OpenAI Official API

import os
from remem.embedding_model.openai_embedding_client import OpenAIEmbeddingModel

os.environ["OPENAI_API_KEY"] = "sk-..."

model = OpenAIEmbeddingModel(
    embedding_model_name="text-embedding-3-large",
    base_url="https://api.openai.com/v1/"
)

embs = model.batch_encode(["Hello, world!"])
print(embs.shape)  # (1, 3072)

Azure OpenAI

import os
from remem.embedding_model.openai_embedding_client import OpenAIEmbeddingModel

os.environ["AZURE_OPENAI_API_KEY"] = "..."
os.environ["OPENAI_API_VERSION"] = "2024-02-01"
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://your-resource.openai.azure.com/"

model = OpenAIEmbeddingModel(
    embedding_model_name="text-embedding-3-large",
    use_azure=True
)

embs = model.batch_encode(["Hello, Azure!"])

Local OpenAI-Compatible Server

from remem.embedding_model.openai_embedding_client import OpenAIEmbeddingModel

# For local servers like vLLM, FastAPI, etc.
model = OpenAIEmbeddingModel(
    embedding_model_name="custom-model",
    base_url="http://localhost:8001/v1/",
    api_key="not-needed"  # Optional for local servers
)

embs = model.batch_encode(["Local embedding"])

Methods

`batch_encode`

def batch_encode(self, texts: List[str], **kwargs) -> np.ndarray

Encodes text into embeddings with automatic batching and retry logic. Parameters:

texts

List[str] | str

required

Text strings to encode

instruction

str

default:"''"

Optional instruction prefix. Will be formatted as: "{instruction}<|endofprefix|>{text}"

batch_size

int

default:"16"

Number of texts per API request

Returns:

embeddings

np.ndarray

2D array of shape (n_texts, embedding_dim). Dimensions: 3072 for text-embedding-3-large, 1536 for text-embedding-3-small

Error Handling:

Content filtering (422 errors): Automatically creates zero-vector fallbacks
Network errors: Retries with exponential backoff (up to max_retries)
Rate limits: Handled by retry logic with jitter

Example:

texts = ["First text", "Second text", "Third text"]

# Basic usage
embs = model.batch_encode(texts)

# With instruction
query_embs = model.batch_encode(
    ["search query"],
    instruction="Represent this query for search"
)

# Large batch with custom batch size
large_batch = [f"Document {i}" for i in range(1000)]
embs = model.batch_encode(large_batch, batch_size=50)

`encode`

def encode(self, texts: List[str], **kwargs) -> np.ndarray

Low-level encoding method without batching. Used internally by batch_encode.

CacheOpenAIEmbeddingModel

from remem.embedding_model.openai_embedding_client import CacheOpenAIEmbeddingModel

Defined in: src/remem/embedding_model/openai_embedding_client.py:374 Extends OpenAIEmbeddingModel with SQLite-based caching to reduce API calls and costs.

Initialization

def __init__(
    self,
    cache_filename: Optional[str] = None,
    global_config: Optional[BaseConfig] = None,
    embedding_model_name: Optional[str] = None,
    api_key: Optional[str] = None,
    base_url: Optional[str] = None,
    max_retries: int = 5,
    **kwargs
) -> None

Parameters: Same as OpenAIEmbeddingModel, plus:

cache_filename

str

default:"None"

Name of SQLite cache file. Defaults to: "{model_name}_embedding_cache.sqlite"Stored in: outputs/{dataset}/embedding_cache/

Cache Behavior

Cache Key: Based on hash of:

Text content
Model name
Instruction
Max length parameter

Cache Hit: Returns embedding from SQLite database Cache Miss: Calls API, stores result in cache

Example

from remem.utils.config_utils import BaseConfig
from remem.embedding_model.openai_embedding_client import CacheOpenAIEmbeddingModel

config = BaseConfig()
config.dataset = "my_dataset"

model = CacheOpenAIEmbeddingModel(
    global_config=config,
    embedding_model_name="text-embedding-3-large",
    base_url="https://api.openai.com/v1/"
)

# First call: API request (cache miss)
texts = ["Machine learning", "Deep learning"]
embs1 = model.batch_encode(texts)
print("Cache stats: 0 hits, 2 misses")

# Second call: No API request (cache hit)
embs2 = model.batch_encode(texts)
print("Cache stats: 2 hits, 0 misses")

assert np.allclose(embs1, embs2)  # Same embeddings

Cache Location

outputs/
  {dataset}/
    embedding_cache/
      text-embedding-3-large_embedding_cache.sqlite
      text-embedding-3-large_embedding_cache.sqlite.lock

Supported Models

OpenAI Models

text-embedding-3-large

3072 dims

Most capable embedding model (March 2024)

text-embedding-3-small

1536 dims

Faster and cheaper than large version

text-embedding-ada-002

1536 dims

Legacy model (still supported)

Custom Models

Any OpenAI-compatible server can be used by specifying:

Custom base_url
Model-specific embedding_model_name

Retry Logic

The client uses exponential backoff with jitter for retries:

# Retry parameters (in _make_http_request_with_retry)
base_delay = 1      # Initial delay: 1 second
factor = 2          # Doubles each retry: 1s, 2s, 4s, 8s, 16s
max_delay = 60      # Capped at 60 seconds
jitter = random     # Random jitter to prevent thundering herd

Retryable Errors:

Network timeouts
Connection errors
HTTP 5xx errors
Rate limit errors

Non-Retryable Errors:

HTTP 422 (content validation) - Creates fallback embedding instead
Authentication errors

Factory Function

The module provides a factory function for automatic client creation:

from remem.embedding_model import _get_embedding_client

# Auto-selects OpenAI client for text-embedding models
client = _get_embedding_client(
    global_config=config,
    embedding_model_name="text-embedding-3-large",
    openai_style_server=True
)

Defined in: src/remem/embedding_model/__init__.py:4

Core API

Information Extraction

RAG Strategies

Embeddings

LLM Backends

Evaluation

OpenAI Embedding Clients

Overview

OpenAIEmbeddingModel

Initialization

Examples

OpenAI Official API

Azure OpenAI

Local OpenAI-Compatible Server

Methods

`batch_encode`

`encode`

CacheOpenAIEmbeddingModel

Initialization

Cache Behavior

Example

Cache Location

Supported Models

OpenAI Models

Custom Models

Retry Logic

Factory Function

See Also

Build docs developers (and LLMs) love

Core API

Information Extraction

RAG Strategies

Embeddings

LLM Backends

Evaluation

​Overview

​OpenAIEmbeddingModel

​Initialization

​Examples

​OpenAI Official API

​Azure OpenAI

​Local OpenAI-Compatible Server

​Methods

​batch_encode

​encode

​CacheOpenAIEmbeddingModel

​Initialization

​Cache Behavior

​Example

​Cache Location

​Supported Models

​OpenAI Models

​Custom Models

​Retry Logic

​Factory Function

​See Also

Build docs developers (and LLMs) love

Overview

OpenAIEmbeddingModel

Initialization

Examples

OpenAI Official API

Azure OpenAI

Local OpenAI-Compatible Server

Methods

`batch_encode`

`encode`

CacheOpenAIEmbeddingModel

Initialization

Cache Behavior

Example

Cache Location

Supported Models

OpenAI Models

Custom Models

Retry Logic

Factory Function

See Also