POST /v1/embeddings

Overview

The proxy embeddings endpoint provides OpenAI-compatible embedding generation with authentication, rate limiting, and cost tracking.

Endpoint

POST {PROXY_BASE_URL}/v1/embeddings

Alternate routes:

POST /embeddings
POST /engines/{model}/embeddings
POST /openai/deployments/{model}/embeddings

Authentication

Authorization

string

required

Bearer token for authentication.

Authorization: Bearer sk-litellm-xxx...

Request Headers

Content-Type

string

default:"application/json"

Content type of the request body.

x-litellm-user-id

string

End-user ID for tracking.

x-litellm-metadata

string

JSON stringified metadata.

Request Body

model

string

required

Embedding model to use.Examples: text-embedding-3-small, text-embedding-ada-002

input

string | array

required

Text to embed. Can be a single string or array of strings.

{
  "input": "The quick brown fox"
}

Or:

{
  "input": ["First text", "Second text", "Third text"]
}

encoding_format

string

default:"float"

Format of the embeddings.Options: "float", "base64"

dimensions

integer

Number of dimensions for the embedding (text-embedding-3 models only).

user

string

Unique identifier for end-user.

Response

Success Response (200)

object

string

Object type, always “list”.

data

array

Array of embedding objects.

Show embedding object

object

string

Object type, always “embedding”.

embedding

array

The embedding vector as array of floats.

index

integer

Index of the embedding in the list.

model

string

Model used for embeddings.

usage

object

Token usage information.

Show usage object

prompt_tokens

integer

Number of tokens in the input.

total_tokens

integer

Total tokens used.

Examples

Basic Request

curl -X POST http://localhost:4000/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-litellm-xxx" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Python Request

import openai

client = openai.OpenAI(
    api_key="sk-litellm-xxx",
    base_url="http://localhost:4000"
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox"
)

print(response.data[0].embedding)

Batch Embeddings

import openai

client = openai.OpenAI(
    api_key="sk-litellm-xxx",
    base_url="http://localhost:4000"
)

texts = [
    "First document",
    "Second document",
    "Third document"
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

for i, embedding in enumerate(response.data):
    print(f"Document {i}: {len(embedding.embedding)} dimensions")

With User Tracking

import openai

client = openai.OpenAI(
    api_key="sk-litellm-xxx",
    base_url="http://localhost:4000"
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Sample text",
    extra_headers={
        "x-litellm-user-id": "user-123"
    }
)

Custom Dimensions

import openai

client = openai.OpenAI(
    api_key="sk-litellm-xxx",
    base_url="http://localhost:4000"
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Sample text",
    dimensions=512  # Reduce from default 1536
)

print(len(response.data[0].embedding))  # 512

Error Responses

401 Unauthorized

{
  "error": {
    "message": "Invalid API key",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

429 Too Many Requests

{
  "error": {
    "message": "Rate limit exceeded for key",
    "type": "rate_limit_error"
  }
}

400 Bad Request

{
  "error": {
    "message": "Invalid input format",
    "type": "invalid_request_error"
  }
}

Proxy Features

Cost Tracking

The proxy automatically tracks embedding costs:

Per-key spending
Per-team spending
Per-user spending

Rate Limiting

Keys can have TPM limits for embeddings:

Requests throttled automatically
429 error when limit exceeded

Model Routing

Proxy can route to different embedding providers:

OpenAI
Azure OpenAI
Cohere
Bedrock
Vertex AI
And more!

SDK Reference

Proxy Endpoints

Configuration

POST /v1/embeddings

Overview

Endpoint

Authentication

Request Headers

Request Body

Response

Success Response (200)

Examples

Basic Request

Python Request

Batch Embeddings

With User Tracking

Custom Dimensions

Error Responses

401 Unauthorized

429 Too Many Requests

400 Bad Request

Proxy Features

Cost Tracking

Rate Limiting

Model Routing

Build docs developers (and LLMs) love

SDK Reference

Proxy Endpoints

Configuration

​Overview

​Endpoint

​Authentication

​Request Headers

​Request Body

​Response

​Success Response (200)

​Examples

​Basic Request

​Python Request

​Batch Embeddings

​With User Tracking

​Custom Dimensions

​Error Responses

​401 Unauthorized

​429 Too Many Requests

​400 Bad Request

​Proxy Features

​Cost Tracking

​Rate Limiting

​Model Routing

​Related

Build docs developers (and LLMs) love

Overview

Endpoint

Authentication

Request Headers

Request Body

Response

Success Response (200)

Examples

Basic Request

Python Request

Batch Embeddings

With User Tracking

Custom Dimensions

Error Responses

401 Unauthorized

429 Too Many Requests

400 Bad Request

Proxy Features

Cost Tracking

Rate Limiting

Model Routing

Related