Embeddings

Endpoint

POST /v1/embeddings

Prerequisites

h2oGPT must be started with LangChain enabled (any --langchain_mode other than Disabled) and the embedding model pre-loaded:

python generate.py \
  --langchain_mode=UserData \
  --pre_load_embedding_model=True

To use a specific HuggingFace embedding model instead of the default, add:

  --hf_embedding_model=sentence-transformers/all-MiniLM-L6-v2 \
  --use_openai_embedding=False

The model field in the request is accepted for API compatibility but is currently ignored. h2oGPT always uses the single embedding model it was started with.

Request parameters

input

string | string[]

required

Text string or array of text strings to embed. Each string is embedded independently. Arrays return one embedding object per input element.

model

string

Accepted for compatibility but unused. The server uses whichever embedding model was loaded at startup.

encoding_format

string

default:"float"

Output encoding: "float" returns a list of floating-point numbers; "base64" returns a base64-encoded string.

user

string

Optional user identifier.

Response

object

string

Always "list".

data

object[]

Array of embedding objects, one per input string.

Show properties

index

number

Position of this embedding in the input array.

object

string

Always "embedding".

embedding

number[]

The dense embedding vector. Dimensionality depends on the loaded model.

model

string

The model identifier used.

usage

object

Token usage with prompt_tokens and total_tokens.

Examples

Single string

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:5000/v1",
    api_key="EMPTY",
)

response = client.embeddings.create(
    input="Your text string goes here",
    model="text-embedding-3-small",  # value is ignored; kept for compatibility
)

print(response.data[0].embedding)

Batch of strings

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:5000/v1",
    api_key="EMPTY",
)

response = client.embeddings.create(
    input=[
        "Your text string goes here",
        "Another text string goes here",
    ],
    model="text-embedding-3-small",
)

print(response.data[0].embedding)
print(response.data[1].embedding)

Supported embedding models

The embedding model is configured at server startup. Common choices include:

Model	Notes
`hkunlp/instructor-large`	Default HuggingFace embedding model
`sentence-transformers/all-MiniLM-L6-v2`	Fast, lightweight
`sentence-transformers/all-mpnet-base-v2`	Higher quality
OpenAI embeddings	Set `--use_openai_embedding=True` and provide `OPENAI_API_KEY`

Pass the desired model at startup with --hf_embedding_model=<model-name>.

OpenAI-Compatible API

Gradio Client API

Endpoint

Prerequisites

Request parameters

Response

Examples

Single string

Batch of strings

Supported embedding models

Build docs developers (and LLMs) love

OpenAI-Compatible API

Gradio Client API

​Endpoint

​Prerequisites

​Request parameters

​Response

​Examples

​Single string

​Batch of strings

​Supported embedding models

Build docs developers (and LLMs) love

Endpoint

Prerequisites

Request parameters

Response

Examples

Single string

Batch of strings

Supported embedding models