Create Embeddings
Creates an embedding vector representing the input text.Request Body
ID of the embedding model to use. Must be an embedding model available in Jan.To use an embedding model, ensure it has
"embedding": true in its settings.Examples: nomic-embed-text, sentence-transformersInput text to embed. Can be a single string or an array of strings.When providing an array, each string will be embedded separately and returned in the same order.Example:Or:
The format to return the embeddings in.Currently only
"float" is supported, which returns embeddings as arrays of floating-point numbers.Response
Always
"list".The model used for generating embeddings.
Array of embedding objects, one for each input string.Each object contains:
object(string): Always"embedding"embedding(array): The embedding vector as an array of floatsindex(number): The index of this embedding in the input array
Token usage information.
prompt_tokens(number): Number of tokens in the inputtotal_tokens(number): Total tokens processed (same as prompt_tokens for embeddings)
Example Response (Single Input)
Example Response (Multiple Inputs)
Batch Processing
Jan automatically batches large embedding requests for optimal performance.Request with Multiple Inputs
cURL
Batch Size
Jan processes embeddings in batches for efficiency. The default batch size is 512 tokens (configurable viaubatch_size in model settings).
Large requests are automatically split into batches and processed sequentially.
Use Cases
Semantic Search
Generate embeddings for documents and queries to find semantically similar content:Python
Clustering
Group similar texts together using embedding vectors:Python
Text Classification
Use embeddings as features for classification tasks:Python
Embedding Models
Jan supports various embedding models. To use a model for embeddings:- The model must have
"embedding": truein its settings - The model architecture must be compatible (e.g., BERT, Nomic-BERT)
Popular Embedding Models
- nomic-embed-text: High-quality text embeddings with 768 dimensions
- sentence-transformers: General-purpose sentence embeddings
- all-MiniLM-L6-v2: Lightweight and fast, 384 dimensions
Model Auto-Loading
If an embedding model is not loaded when you make a request, Jan will:- Automatically load the model in embedding mode
- Process your request
- Keep the model loaded for subsequent requests
501 status (not available), Jan will reload the model with embedding support enabled.
Embedding Dimensions
Embedding dimensions vary by model:- nomic-embed-text: 768 dimensions
- all-MiniLM-L6-v2: 384 dimensions
- sentence-transformers: Varies by variant (typically 384-1024)
Error Handling
Model Not Available
If you request an embedding from a non-embedding model:400 Bad Request
Embedding Endpoint Not Available
If the model doesn’t have embedding support enabled:501 Not Implemented
Jan will automatically reload the model with embedding support and retry.
Input Too Long
If input exceeds the model’s maximum token limit:400 Bad Request
Performance Tips
Batch Requests
Process multiple texts in a single request for better performance:Keep Model Loaded
Embedding models stay loaded in memory for subsequent requests. Avoid unloading between requests to maintain performance.GPU Acceleration
Enable GPU acceleration by settingngl (number of GPU layers) in model settings for faster embedding generation.