Skip to main content

Endpoint

POST /v1/embeddings

Prerequisites

h2oGPT must be started with LangChain enabled (any --langchain_mode other than Disabled) and the embedding model pre-loaded:
python generate.py \
  --langchain_mode=UserData \
  --pre_load_embedding_model=True
To use a specific HuggingFace embedding model instead of the default, add:
  --hf_embedding_model=sentence-transformers/all-MiniLM-L6-v2 \
  --use_openai_embedding=False
The model field in the request is accepted for API compatibility but is currently ignored. h2oGPT always uses the single embedding model it was started with.

Request parameters

input
string | string[]
required
Text string or array of text strings to embed. Each string is embedded independently. Arrays return one embedding object per input element.
model
string
Accepted for compatibility but unused. The server uses whichever embedding model was loaded at startup.
encoding_format
string
default:"float"
Output encoding: "float" returns a list of floating-point numbers; "base64" returns a base64-encoded string.
user
string
Optional user identifier.

Response

object
string
Always "list".
data
object[]
Array of embedding objects, one per input string.
model
string
The model identifier used.
usage
object
Token usage with prompt_tokens and total_tokens.

Examples

Single string

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:5000/v1",
    api_key="EMPTY",
)

response = client.embeddings.create(
    input="Your text string goes here",
    model="text-embedding-3-small",  # value is ignored; kept for compatibility
)

print(response.data[0].embedding)

Batch of strings

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:5000/v1",
    api_key="EMPTY",
)

response = client.embeddings.create(
    input=[
        "Your text string goes here",
        "Another text string goes here",
    ],
    model="text-embedding-3-small",
)

print(response.data[0].embedding)
print(response.data[1].embedding)

Supported embedding models

The embedding model is configured at server startup. Common choices include:
ModelNotes
hkunlp/instructor-largeDefault HuggingFace embedding model
sentence-transformers/all-MiniLM-L6-v2Fast, lightweight
sentence-transformers/all-mpnet-base-v2Higher quality
OpenAI embeddingsSet --use_openai_embedding=True and provide OPENAI_API_KEY
Pass the desired model at startup with --hf_embedding_model=<model-name>.

Build docs developers (and LLMs) love