Embeddings turn text into numeric vectors you can store in a vector database, search with cosine similarity, or use in RAG pipelines. The vector length depends on the model (typically 384–1024 dimensions).
Recommended models
Generate embeddings
CLI
cURL
Python
JavaScript
Generate embeddings directly from the command line:ollama run embeddinggemma "Hello world"
You can also pipe text to generate embeddings:echo "Hello world" | ollama run embeddinggemma
Output is a JSON array. curl -X POST http://localhost:11434/api/embed \
-H "Content-Type: application/json" \
-d '{
"model": "embeddinggemma",
"input": "The quick brown fox jumps over the lazy dog."
}'
import ollama
single = ollama.embed(
model='embeddinggemma',
input='The quick brown fox jumps over the lazy dog.'
)
print(len(single['embeddings'][0])) # vector length
import ollama from 'ollama'
const single = await ollama.embed({
model: 'embeddinggemma',
input: 'The quick brown fox jumps over the lazy dog.',
})
console.log(single.embeddings[0].length) // vector length
The /api/embed endpoint returns L2-normalized (unit-length) vectors.
Generate a batch of embeddings
Pass an array of strings to input to generate multiple embeddings in a single request.
curl -X POST http://localhost:11434/api/embed \
-H "Content-Type: application/json" \
-d '{
"model": "embeddinggemma",
"input": [
"First sentence",
"Second sentence",
"Third sentence"
]
}'
import ollama
batch = ollama.embed(
model='embeddinggemma',
input=[
'The quick brown fox jumps over the lazy dog.',
'The five boxing wizards jump quickly.',
'Jackdaws love my big sphinx of quartz.',
]
)
print(len(batch['embeddings'])) # number of vectors
import ollama from 'ollama'
const batch = await ollama.embed({
model: 'embeddinggemma',
input: [
'The quick brown fox jumps over the lazy dog.',
'The five boxing wizards jump quickly.',
'Jackdaws love my big sphinx of quartz.',
],
})
console.log(batch.embeddings.length) // number of vectors
API parameters
The embedding model name (e.g., embeddinggemma, all-minilm)
The text or array of texts to embed
Truncate input to fit the model’s max sequence length
Truncate the output embedding to the specified dimension (matryoshka embeddings)
How long to keep the model loaded in memory
Response structure
{
"model": "embeddinggemma",
"embeddings": [
[0.123, -0.456, 0.789, ...],
[0.321, -0.654, 0.987, ...]
],
"total_duration": 124563708,
"load_duration": 6338219,
"prompt_eval_count": 12
}
Using embeddings for semantic search
Generate embeddings for your documents
import ollama
documents = [
"Ollama is a tool for running LLMs locally.",
"Python is a popular programming language.",
"Machine learning models can be deployed on edge devices."
]
doc_embeddings = ollama.embed(
model='embeddinggemma',
input=documents
)['embeddings']
Generate an embedding for the query
query = "How do I run AI models on my computer?"
query_embedding = ollama.embed(
model='embeddinggemma',
input=query
)['embeddings'][0]
Calculate cosine similarity
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
similarities = [
cosine_similarity(query_embedding, doc_emb)
for doc_emb in doc_embeddings
]
# Find the most similar document
best_match_idx = np.argmax(similarities)
print(f"Best match: {documents[best_match_idx]}")
print(f"Similarity: {similarities[best_match_idx]:.4f}")
Matryoshka embeddings
Some embedding models support matryoshka representations, allowing you to truncate embeddings to smaller dimensions while maintaining good performance.
import ollama
# Generate full embedding
full = ollama.embed(
model='embeddinggemma',
input='Sample text'
)
# Generate truncated embedding (faster search, less storage)
truncated = ollama.embed(
model='embeddinggemma',
input='Sample text',
dimensions=256
)
print(f"Full dimension: {len(full['embeddings'][0])}") # e.g., 768
print(f"Truncated dimension: {len(truncated['embeddings'][0])}") # 256
Tips
- Use cosine similarity for most semantic search use cases
- Use the same embedding model for both indexing and querying
- Normalize embeddings if your vector database doesn’t do it automatically (Ollama returns normalized vectors)
- Batch embed documents for better performance
- Store embeddings in a vector database like Chroma, Pinecone, or Qdrant for production use
- Consider using
dimensions parameter for faster search with minimal quality loss