Skip to main content

Overview

Embeddings convert text into numerical vectors that capture semantic meaning. LlamaIndex.TS uses embeddings for:
  • Semantic search: Finding similar documents based on meaning, not just keywords
  • RAG (Retrieval-Augmented Generation): Retrieving relevant context for LLM queries
  • Clustering and classification: Grouping similar texts together
  • Similarity comparison: Measuring how related two pieces of text are

BaseEmbedding Interface

All embedding models in LlamaIndex.TS extend the BaseEmbedding class from @llamaindex/core/embeddings:
import { BaseEmbedding } from "@llamaindex/core/embeddings";

abstract class BaseEmbedding {
  abstract getTextEmbedding(text: string): Promise<number[]>;
  
  getTextEmbeddings(texts: string[]): Promise<Array<number[]>>;
  getTextEmbeddingsBatch(texts: string[], options?): Promise<Array<number[]>>;
  getQueryEmbedding(query: MessageContentDetail): Promise<number[] | null>;
  
  similarity(embedding1: number[], embedding2: number[], mode?): number;
  
  embedBatchSize: number;
  embedInfo?: EmbeddingInfo;
}

Embedding Info

Embedding models expose metadata about their capabilities:
type EmbeddingInfo = {
  dimensions?: number;      // Vector dimensions (e.g., 1536, 3072)
  maxTokens?: number;       // Maximum input tokens
  tokenizer?: Tokenizers;   // Tokenizer used
};

Generating Embeddings

Single Text Embedding

Embed a single piece of text:
import { OpenAIEmbedding } from "@llamaindex/openai";

const embedModel = new OpenAIEmbedding({
  model: "text-embedding-3-small",
});

const embedding = await embedModel.getTextEmbedding(
  "LlamaIndex is a data framework for LLM applications"
);

console.log(embedding); // [0.123, -0.456, 0.789, ...]
console.log(embedding.length); // 1536

Multiple Text Embeddings

Embed multiple texts efficiently:
const texts = [
  "What is artificial intelligence?",
  "Machine learning is a subset of AI",
  "Deep learning uses neural networks",
];

const embeddings = await embedModel.getTextEmbeddings(texts);
console.log(embeddings.length); // 3
console.log(embeddings[0].length); // 1536

Query Embeddings

Embed queries for semantic search:
const queryEmbedding = await embedModel.getQueryEmbedding({
  type: "text",
  text: "How does RAG work?",
});
getQueryEmbedding accepts MessageContentDetail types, making it compatible with multi-modal queries.

Batch Processing

For large datasets, use batch processing with automatic chunking:
const manyTexts = [...]; // Array of 1000 texts

const embeddings = await embedModel.getTextEmbeddingsBatch(manyTexts, {
  logProgress: true,
  progressCallback: (current, total) => {
    console.log(`Progress: ${current}/${total}`);
  },
});
Batch options:
  • logProgress: Log progress to console
  • progressCallback: Custom progress handler
  • logger: Custom logger instance
Automatic batching: Embeddings are automatically batched according to embedBatchSize (default: 10) to optimize API calls:
const embedModel = new OpenAIEmbedding({
  embedBatchSize: 100, // Process 100 texts per API call
});

Similarity Calculation

Compare embeddings to measure semantic similarity:
import { SimilarityType } from "@llamaindex/core/embeddings";

const embedding1 = await embedModel.getTextEmbedding("cats are pets");
const embedding2 = await embedModel.getTextEmbedding("dogs are pets");
const embedding3 = await embedModel.getTextEmbedding("quantum physics");

// Cosine similarity (default)
const similarity1 = embedModel.similarity(embedding1, embedding2);
console.log(similarity1); // ~0.85 (high similarity)

const similarity2 = embedModel.similarity(embedding1, embedding3);
console.log(similarity2); // ~0.25 (low similarity)

// Other similarity types
const euclidean = embedModel.similarity(
  embedding1,
  embedding2,
  SimilarityType.EUCLIDEAN
);

const dotProduct = embedModel.similarity(
  embedding1,
  embedding2,
  SimilarityType.DOT_PRODUCT
);
Similarity types:
  • SimilarityType.DEFAULT - Cosine similarity (recommended)
  • SimilarityType.EUCLIDEAN - Euclidean distance
  • SimilarityType.DOT_PRODUCT - Dot product

Supported Embedding Models

LlamaIndex.TS supports embeddings from multiple providers:

OpenAI

import { OpenAIEmbedding } from "@llamaindex/openai";

const embedModel = new OpenAIEmbedding({
  model: "text-embedding-3-small", // 1536 dimensions
  // model: "text-embedding-3-large", // 3072 dimensions
  // model: "text-embedding-ada-002", // Legacy, 1536 dimensions
  
  dimensions: 512, // Optional: reduce dimensions for 3-small/3-large
});
Available models:
  • text-embedding-3-small: 1536 dims, best performance/cost ratio
  • text-embedding-3-large: 3072 dims, highest quality
  • text-embedding-ada-002: Legacy model, 1536 dims

Google Gemini

import { GeminiEmbedding, GEMINI_EMBEDDING_MODEL } from "@llamaindex/google";

const embedModel = new GeminiEmbedding({
  model: GEMINI_EMBEDDING_MODEL.TEXT_EMBEDDING_004,
});

Voyage AI

import { VoyageAIEmbedding } from "@llamaindex/voyage-ai";

const embedModel = new VoyageAIEmbedding({
  model: "voyage-2",
  apiKey: process.env.VOYAGE_API_KEY,
});

HuggingFace

import { HuggingFaceEmbedding } from "@llamaindex/huggingface";

// Using HuggingFace Inference API
const embedModel = new HuggingFaceEmbedding({
  modelType: "BAAI/bge-small-en-v1.5",
  apiKey: process.env.HUGGINGFACE_API_KEY,
});

Ollama (Local)

import { OllamaEmbedding } from "@llamaindex/ollama";

const embedModel = new OllamaEmbedding({
  model: "nomic-embed-text",
  config: {
    host: "http://localhost:11434",
  },
});

Cohere

import { CohereEmbedding } from "@llamaindex/cohere";

const embedModel = new CohereEmbedding({
  model: "embed-english-v3.0",
  apiKey: process.env.COHERE_API_KEY,
});

Jina AI

import { JinaAIEmbedding } from "@llamaindex/jinaai";

const embedModel = new JinaAIEmbedding({
  model: "jina-embeddings-v2-base-en",
  apiKey: process.env.JINAAI_API_KEY,
});

Mixedbread

import { MixedbreadEmbedding } from "@llamaindex/mixedbread";

const embedModel = new MixedbreadEmbedding({
  model: "mixedbread-ai/mxbai-embed-large-v1",
  apiKey: process.env.MIXEDBREAD_API_KEY,
});

Using Embeddings with Vector Stores

Embeddings are typically used with vector stores for retrieval:
import { VectorStoreIndex, Document } from "llamaindex";
import { OpenAIEmbedding } from "@llamaindex/openai";
import { OpenAI } from "@llamaindex/openai";

const embedModel = new OpenAIEmbedding();
const llm = new OpenAI();

// Create documents
const documents = [
  new Document({ text: "LlamaIndex is a data framework for LLMs" }),
  new Document({ text: "RAG combines retrieval with generation" }),
];

// Create index with embedding model
const index = await VectorStoreIndex.fromDocuments(documents, {
  embedModel,
});

// Query using embeddings
const queryEngine = index.asQueryEngine({ llm });
const response = await queryEngine.query({ query: "What is RAG?" });

Embedding Nodes

Embed document nodes for indexing:
import { BaseNode, Document } from "llamaindex";
import { OpenAIEmbedding } from "@llamaindex/openai";

const embedModel = new OpenAIEmbedding();

// Documents get converted to nodes
const documents = [
  new Document({ text: "Content to embed" }),
];

// Embed nodes
const nodes = await embedModel.transform(documents);
console.log(nodes[0].embedding); // Embedding vector attached to node

Examples

import { OpenAIEmbedding } from "@llamaindex/openai";

const embedModel = new OpenAIEmbedding();

// Index documents
const documents = [
  "Python is a programming language",
  "JavaScript is used for web development",
  "Cats are popular pets",
  "Dogs are loyal companions",
];

const docEmbeddings = await embedModel.getTextEmbeddings(documents);

// Search query
const queryEmbedding = await embedModel.getTextEmbedding(
  "Tell me about pet animals"
);

// Find most similar
const similarities = docEmbeddings.map((docEmb) =>
  embedModel.similarity(queryEmbedding, docEmb)
);

const topIndex = similarities.indexOf(Math.max(...similarities));
console.log("Most relevant:", documents[topIndex]);
// Output: "Cats are popular pets" or "Dogs are loyal companions"

Progress Tracking

const largeDataset = [...]; // 10,000 texts

const embeddings = await embedModel.getTextEmbeddingsBatch(largeDataset, {
  logProgress: true,
  progressCallback: (current, total) => {
    const percent = ((current / total) * 100).toFixed(1);
    console.log(`Embedding progress: ${percent}%`);
  },
});

Custom Dimensions (OpenAI)

// Reduce dimensions for faster search and lower storage
const embedModel = new OpenAIEmbedding({
  model: "text-embedding-3-large",
  dimensions: 256, // Instead of default 3072
});

const embedding = await embedModel.getTextEmbedding("test");
console.log(embedding.length); // 256

Best Practices

  • OpenAI text-embedding-3-small: Best balance of quality and cost
  • OpenAI text-embedding-3-large: Highest quality, more expensive
  • Voyage AI: Excellent for domain-specific tasks
  • Local (Ollama): Privacy-focused, no API costs
Always use batch methods for multiple texts:
// Good: Single API call
const embeddings = await embedModel.getTextEmbeddings(texts);

// Bad: Multiple API calls
const embeddings = await Promise.all(
  texts.map(t => embedModel.getTextEmbedding(t))
);
Embeddings are deterministic - cache them to avoid re-computing:
const cache = new Map();

async function getEmbeddingCached(text: string) {
  if (cache.has(text)) return cache.get(text);
  const embedding = await embedModel.getTextEmbedding(text);
  cache.set(text, embedding);
  return embedding;
}
Use batch size and retries for large datasets:
const embedModel = new OpenAIEmbedding({
  embedBatchSize: 100,
  maxRetries: 3,
  timeout: 60000,
});
Clean and normalize text before embedding:
function normalizeText(text: string) {
  return text
    .toLowerCase()
    .replace(/\s+/g, ' ')
    .trim();
}

const embedding = await embedModel.getTextEmbedding(
  normalizeText(userInput)
);

Multi-Modal Embeddings

Some providers support multi-modal embeddings:

CLIP (Images + Text)

import { ClipEmbedding } from "@llamaindex/clip";

const embedModel = new ClipEmbedding();

// Embed images and text in same space
const imageEmbedding = await embedModel.getImageEmbedding(imageBuffer);
const textEmbedding = await embedModel.getTextEmbedding("a photo of a cat");

// Compare image-text similarity
const similarity = embedModel.similarity(imageEmbedding, textEmbedding);

Next Steps

LLMs

Learn about language models and chat interfaces

Providers

Explore all available embedding providers

Build docs developers (and LLMs) love