Skip to main content

Overview

The proxy embeddings endpoint provides OpenAI-compatible embedding generation with authentication, rate limiting, and cost tracking.

Endpoint

POST {PROXY_BASE_URL}/v1/embeddings
Alternate routes:
  • POST /embeddings
  • POST /engines/{model}/embeddings
  • POST /openai/deployments/{model}/embeddings

Authentication

Authorization
string
required
Bearer token for authentication.
Authorization: Bearer sk-litellm-xxx...

Request Headers

Content-Type
string
default:"application/json"
Content type of the request body.
x-litellm-user-id
string
End-user ID for tracking.
x-litellm-metadata
string
JSON stringified metadata.

Request Body

model
string
required
Embedding model to use.Examples: text-embedding-3-small, text-embedding-ada-002
input
string | array
required
Text to embed. Can be a single string or array of strings.
{
  "input": "The quick brown fox"
}
Or:
{
  "input": ["First text", "Second text", "Third text"]
}
encoding_format
string
default:"float"
Format of the embeddings.Options: "float", "base64"
dimensions
integer
Number of dimensions for the embedding (text-embedding-3 models only).
user
string
Unique identifier for end-user.

Response

Success Response (200)

object
string
Object type, always “list”.
data
array
Array of embedding objects.
model
string
Model used for embeddings.
usage
object
Token usage information.

Examples

Basic Request

curl -X POST http://localhost:4000/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-litellm-xxx" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Python Request

import openai

client = openai.OpenAI(
    api_key="sk-litellm-xxx",
    base_url="http://localhost:4000"
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox"
)

print(response.data[0].embedding)

Batch Embeddings

import openai

client = openai.OpenAI(
    api_key="sk-litellm-xxx",
    base_url="http://localhost:4000"
)

texts = [
    "First document",
    "Second document",
    "Third document"
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

for i, embedding in enumerate(response.data):
    print(f"Document {i}: {len(embedding.embedding)} dimensions")

With User Tracking

import openai

client = openai.OpenAI(
    api_key="sk-litellm-xxx",
    base_url="http://localhost:4000"
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Sample text",
    extra_headers={
        "x-litellm-user-id": "user-123"
    }
)

Custom Dimensions

import openai

client = openai.OpenAI(
    api_key="sk-litellm-xxx",
    base_url="http://localhost:4000"
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Sample text",
    dimensions=512  # Reduce from default 1536
)

print(len(response.data[0].embedding))  # 512

Error Responses

401 Unauthorized

{
  "error": {
    "message": "Invalid API key",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

429 Too Many Requests

{
  "error": {
    "message": "Rate limit exceeded for key",
    "type": "rate_limit_error"
  }
}

400 Bad Request

{
  "error": {
    "message": "Invalid input format",
    "type": "invalid_request_error"
  }
}

Proxy Features

Cost Tracking

The proxy automatically tracks embedding costs:
  • Per-key spending
  • Per-team spending
  • Per-user spending

Rate Limiting

Keys can have TPM limits for embeddings:
  • Requests throttled automatically
  • 429 error when limit exceeded

Model Routing

Proxy can route to different embedding providers:
  • OpenAI
  • Azure OpenAI
  • Cohere
  • Bedrock
  • Vertex AI
  • And more!

Build docs developers (and LLMs) love