Overview
The proxy embeddings endpoint provides OpenAI-compatible embedding generation with authentication, rate limiting, and cost tracking.
Endpoint
POST {PROXY_BASE_URL}/v1/embeddings
Alternate routes:
POST /embeddings
POST /engines/{model}/embeddings
POST /openai/deployments/{model}/embeddings
Authentication
Bearer token for authentication. Authorization: Bearer sk-litellm-xxx...
Content-Type
string
default: "application/json"
Content type of the request body.
End-user ID for tracking.
JSON stringified metadata.
Request Body
Embedding model to use. Examples: text-embedding-3-small, text-embedding-ada-002
Text to embed. Can be a single string or array of strings. {
"input" : "The quick brown fox"
}
Or: {
"input" : [ "First text" , "Second text" , "Third text" ]
}
Format of the embeddings. Options: "float", "base64"
Number of dimensions for the embedding (text-embedding-3 models only).
Unique identifier for end-user.
Response
Success Response (200)
Object type, always “list”.
Array of embedding objects. Object type, always “embedding”.
The embedding vector as array of floats.
Index of the embedding in the list.
Model used for embeddings.
Token usage information. Number of tokens in the input.
Examples
Basic Request
curl -X POST http://localhost:4000/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-litellm-xxx" \
-d '{
"model": "text-embedding-3-small",
"input": "The quick brown fox jumps over the lazy dog"
}'
Python Request
import openai
client = openai.OpenAI(
api_key = "sk-litellm-xxx" ,
base_url = "http://localhost:4000"
)
response = client.embeddings.create(
model = "text-embedding-3-small" ,
input = "The quick brown fox"
)
print (response.data[ 0 ].embedding)
Batch Embeddings
import openai
client = openai.OpenAI(
api_key = "sk-litellm-xxx" ,
base_url = "http://localhost:4000"
)
texts = [
"First document" ,
"Second document" ,
"Third document"
]
response = client.embeddings.create(
model = "text-embedding-3-small" ,
input = texts
)
for i, embedding in enumerate (response.data):
print ( f "Document { i } : { len (embedding.embedding) } dimensions" )
With User Tracking
import openai
client = openai.OpenAI(
api_key = "sk-litellm-xxx" ,
base_url = "http://localhost:4000"
)
response = client.embeddings.create(
model = "text-embedding-3-small" ,
input = "Sample text" ,
extra_headers = {
"x-litellm-user-id" : "user-123"
}
)
Custom Dimensions
import openai
client = openai.OpenAI(
api_key = "sk-litellm-xxx" ,
base_url = "http://localhost:4000"
)
response = client.embeddings.create(
model = "text-embedding-3-small" ,
input = "Sample text" ,
dimensions = 512 # Reduce from default 1536
)
print ( len (response.data[ 0 ].embedding)) # 512
Error Responses
401 Unauthorized
{
"error" : {
"message" : "Invalid API key" ,
"type" : "invalid_request_error" ,
"code" : "invalid_api_key"
}
}
429 Too Many Requests
{
"error" : {
"message" : "Rate limit exceeded for key" ,
"type" : "rate_limit_error"
}
}
400 Bad Request
{
"error" : {
"message" : "Invalid input format" ,
"type" : "invalid_request_error"
}
}
Proxy Features
Cost Tracking
The proxy automatically tracks embedding costs:
Per-key spending
Per-team spending
Per-user spending
Rate Limiting
Keys can have TPM limits for embeddings:
Requests throttled automatically
429 error when limit exceeded
Model Routing
Proxy can route to different embedding providers:
OpenAI
Azure OpenAI
Cohere
Bedrock
Vertex AI
And more!