OpenAI

Overview

The OpenAIDenseEmbedding class provides text-to-vector embedding capabilities using OpenAI’s embedding models. It supports various models with different dimensions and includes automatic result caching for improved performance.

Installation

pip install openai

Authentication

Set your OpenAI API key as an environment variable:

export OPENAI_API_KEY="sk-..."

Alternatively, pass the API key directly to the constructor:

emb_func = OpenAIDenseEmbedding(api_key="sk-...")

Obtain your API key from OpenAI Platform.

Basic Usage

from zvec.extension import OpenAIDenseEmbedding
import os

# Set API key
os.environ["OPENAI_API_KEY"] = "sk-..."

# Initialize with default model (text-embedding-3-small)
emb_func = OpenAIDenseEmbedding()
vector = emb_func.embed("Hello, world!")

print(f"Dimension: {len(vector)}")
# Output: Dimension: 1536

Model Selection

OpenAI offers several embedding models:

Model	Dimensions	Description
`text-embedding-3-small`	1536	Cost-efficient, good performance (default)
`text-embedding-3-large`	3072	Highest quality
`text-embedding-ada-002`	1536	Legacy model

# Using text-embedding-3-large
emb_func = OpenAIDenseEmbedding(
    model="text-embedding-3-large",
    dimension=1024,  # Optional: custom dimension
    api_key="sk-..."
)

vector = emb_func.embed("Machine learning is fascinating")
print(f"Dimension: {len(vector)}")
# Output: Dimension: 1024

Custom Dimensions

For text-embedding-3 models, you can specify custom dimensions to reduce vector size:

emb_func = OpenAIDenseEmbedding(
    model="text-embedding-3-small",
    dimension=512  # Reduce from default 1536 to 512
)

vector = emb_func.embed("Natural language processing")
print(f"Dimension: {len(vector)}")
# Output: Dimension: 512

Azure OpenAI

Use a custom base URL for Azure OpenAI or compatible services:

emb_func = OpenAIDenseEmbedding(
    model="text-embedding-ada-002",
    api_key="your-azure-key",
    base_url="https://your-resource.openai.azure.com/"
)

vector = emb_func.embed("Azure OpenAI integration")

Using with Zvec Collections

from zvec import Collection, DataType
from zvec.extension import OpenAIDenseEmbedding

# Initialize embedding function
emb_func = OpenAIDenseEmbedding(
    model="text-embedding-3-small",
    dimension=1536
)

# Create collection with OpenAI embeddings
collection = Collection(name="documents")
collection.create_field("id", DataType.INT64, is_primary=True)
collection.create_field("text", DataType.VARCHAR, max_length=512)
collection.create_field(
    name="vector",
    dtype=DataType.VECTOR_FP32,
    dimension=1536,
    embedding_function=emb_func
)
collection.create()

# Insert data - embeddings are generated automatically
collection.insert([
    {"id": 1, "text": "Introduction to machine learning"},
    {"id": 2, "text": "Deep learning with neural networks"},
    {"id": 3, "text": "Natural language processing basics"}
])

# Query with automatic embedding
results = collection.query(
    data={"vector": ["machine learning algorithms"]},
    output_fields=["id", "text"],
    topk=2
)

for result in results:
    print(f"ID: {result['id']}, Text: {result['text']}")

Batch Processing

The embedding function includes automatic caching for repeated inputs:

emb_func = OpenAIDenseEmbedding()

texts = [
    "First document",
    "Second document",
    "First document"  # This will use cached result
]

vectors = [emb_func.embed(text) for text in texts]
# Third call returns cached result for "First document"

Error Handling

try:
    emb_func.embed("")  # Empty string
except ValueError as e:
    print(f"Error: {e}")
    # Output: Error: Input text cannot be empty or whitespace only

try:
    emb_func.embed(123)  # Non-string input
except TypeError as e:
    print(f"Error: {e}")
    # Output: Error: Expected 'input' to be str, got int

Configuration Options

model

string

default:"text-embedding-3-small"

OpenAI embedding model identifier

dimension

int

default:"None"

Desired output embedding dimension. If None, uses model’s default dimension

api_key

string

default:"None"

OpenAI API authentication key. If None, reads from OPENAI_API_KEY environment variable

base_url

string

default:"None"

Custom API base URL for OpenAI-compatible services

Notes

Results are cached (LRU cache, maxsize=10) to reduce API calls
API usage incurs costs based on your OpenAI subscription plan
Rate limits apply based on your OpenAI account tier
Network connectivity to OpenAI API endpoints is required
Maximum input length is 8191 tokens for most models

Get Started

Core Concepts

Guides

Integrations

Advanced

Overview

Installation

Authentication

Basic Usage

Model Selection

Custom Dimensions

Azure OpenAI

Using with Zvec Collections

Batch Processing

Error Handling

Configuration Options

Notes

See Also

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Integrations

Advanced

​Overview

​Installation

​Authentication

​Basic Usage

​Model Selection

​Custom Dimensions

​Azure OpenAI

​Using with Zvec Collections

​Batch Processing

​Error Handling

​Configuration Options

​Notes

​See Also

Build docs developers (and LLMs) love

Overview

Installation

Authentication

Basic Usage

Model Selection

Custom Dimensions

Azure OpenAI

Using with Zvec Collections

Batch Processing

Error Handling

Configuration Options

Notes

See Also