Embeddings

Overview

Embeddings convert text into numerical vectors that capture semantic meaning. LlamaIndex.TS uses embeddings for:

Semantic search: Finding similar documents based on meaning, not just keywords
RAG (Retrieval-Augmented Generation): Retrieving relevant context for LLM queries
Clustering and classification: Grouping similar texts together
Similarity comparison: Measuring how related two pieces of text are

BaseEmbedding Interface

All embedding models in LlamaIndex.TS extend the BaseEmbedding class from @llamaindex/core/embeddings:

import { BaseEmbedding } from "@llamaindex/core/embeddings";

abstract class BaseEmbedding {
  abstract getTextEmbedding(text: string): Promise<number[]>;
  
  getTextEmbeddings(texts: string[]): Promise<Array<number[]>>;
  getTextEmbeddingsBatch(texts: string[], options?): Promise<Array<number[]>>;
  getQueryEmbedding(query: MessageContentDetail): Promise<number[] | null>;
  
  similarity(embedding1: number[], embedding2: number[], mode?): number;
  
  embedBatchSize: number;
  embedInfo?: EmbeddingInfo;
}

Embedding Info

Embedding models expose metadata about their capabilities:

type EmbeddingInfo = {
  dimensions?: number;      // Vector dimensions (e.g., 1536, 3072)
  maxTokens?: number;       // Maximum input tokens
  tokenizer?: Tokenizers;   // Tokenizer used
};

Generating Embeddings

Single Text Embedding

Embed a single piece of text:

import { OpenAIEmbedding } from "@llamaindex/openai";

const embedModel = new OpenAIEmbedding({
  model: "text-embedding-3-small",
});

const embedding = await embedModel.getTextEmbedding(
  "LlamaIndex is a data framework for LLM applications"
);

console.log(embedding); // [0.123, -0.456, 0.789, ...]
console.log(embedding.length); // 1536

Multiple Text Embeddings

Embed multiple texts efficiently:

const texts = [
  "What is artificial intelligence?",
  "Machine learning is a subset of AI",
  "Deep learning uses neural networks",
];

const embeddings = await embedModel.getTextEmbeddings(texts);
console.log(embeddings.length); // 3
console.log(embeddings[0].length); // 1536

Query Embeddings

Embed queries for semantic search:

const queryEmbedding = await embedModel.getQueryEmbedding({
  type: "text",
  text: "How does RAG work?",
});

getQueryEmbedding accepts MessageContentDetail types, making it compatible with multi-modal queries.

Batch Processing

For large datasets, use batch processing with automatic chunking:

const manyTexts = [...]; // Array of 1000 texts

const embeddings = await embedModel.getTextEmbeddingsBatch(manyTexts, {
  logProgress: true,
  progressCallback: (current, total) => {
    console.log(`Progress: ${current}/${total}`);
  },
});

Batch options:

logProgress: Log progress to console
progressCallback: Custom progress handler
logger: Custom logger instance

Automatic batching: Embeddings are automatically batched according to embedBatchSize (default: 10) to optimize API calls:

const embedModel = new OpenAIEmbedding({
  embedBatchSize: 100, // Process 100 texts per API call
});

Similarity Calculation

Compare embeddings to measure semantic similarity:

import { SimilarityType } from "@llamaindex/core/embeddings";

const embedding1 = await embedModel.getTextEmbedding("cats are pets");
const embedding2 = await embedModel.getTextEmbedding("dogs are pets");
const embedding3 = await embedModel.getTextEmbedding("quantum physics");

// Cosine similarity (default)
const similarity1 = embedModel.similarity(embedding1, embedding2);
console.log(similarity1); // ~0.85 (high similarity)

const similarity2 = embedModel.similarity(embedding1, embedding3);
console.log(similarity2); // ~0.25 (low similarity)

// Other similarity types
const euclidean = embedModel.similarity(
  embedding1,
  embedding2,
  SimilarityType.EUCLIDEAN
);

const dotProduct = embedModel.similarity(
  embedding1,
  embedding2,
  SimilarityType.DOT_PRODUCT
);

Similarity types:

SimilarityType.DEFAULT - Cosine similarity (recommended)
SimilarityType.EUCLIDEAN - Euclidean distance
SimilarityType.DOT_PRODUCT - Dot product

Supported Embedding Models

LlamaIndex.TS supports embeddings from multiple providers:

OpenAI

import { OpenAIEmbedding } from "@llamaindex/openai";

const embedModel = new OpenAIEmbedding({
  model: "text-embedding-3-small", // 1536 dimensions
  // model: "text-embedding-3-large", // 3072 dimensions
  // model: "text-embedding-ada-002", // Legacy, 1536 dimensions
  
  dimensions: 512, // Optional: reduce dimensions for 3-small/3-large
});

Available models:

text-embedding-3-small: 1536 dims, best performance/cost ratio
text-embedding-3-large: 3072 dims, highest quality
text-embedding-ada-002: Legacy model, 1536 dims

Google Gemini

import { GeminiEmbedding, GEMINI_EMBEDDING_MODEL } from "@llamaindex/google";

const embedModel = new GeminiEmbedding({
  model: GEMINI_EMBEDDING_MODEL.TEXT_EMBEDDING_004,
});

Voyage AI

import { VoyageAIEmbedding } from "@llamaindex/voyage-ai";

const embedModel = new VoyageAIEmbedding({
  model: "voyage-2",
  apiKey: process.env.VOYAGE_API_KEY,
});

HuggingFace

import { HuggingFaceEmbedding } from "@llamaindex/huggingface";

// Using HuggingFace Inference API
const embedModel = new HuggingFaceEmbedding({
  modelType: "BAAI/bge-small-en-v1.5",
  apiKey: process.env.HUGGINGFACE_API_KEY,
});

Ollama (Local)

import { OllamaEmbedding } from "@llamaindex/ollama";

const embedModel = new OllamaEmbedding({
  model: "nomic-embed-text",
  config: {
    host: "http://localhost:11434",
  },
});

Cohere

import { CohereEmbedding } from "@llamaindex/cohere";

const embedModel = new CohereEmbedding({
  model: "embed-english-v3.0",
  apiKey: process.env.COHERE_API_KEY,
});

Jina AI

import { JinaAIEmbedding } from "@llamaindex/jinaai";

const embedModel = new JinaAIEmbedding({
  model: "jina-embeddings-v2-base-en",
  apiKey: process.env.JINAAI_API_KEY,
});

Mixedbread

import { MixedbreadEmbedding } from "@llamaindex/mixedbread";

const embedModel = new MixedbreadEmbedding({
  model: "mixedbread-ai/mxbai-embed-large-v1",
  apiKey: process.env.MIXEDBREAD_API_KEY,
});

Using Embeddings with Vector Stores

Embeddings are typically used with vector stores for retrieval:

import { VectorStoreIndex, Document } from "llamaindex";
import { OpenAIEmbedding } from "@llamaindex/openai";
import { OpenAI } from "@llamaindex/openai";

const embedModel = new OpenAIEmbedding();
const llm = new OpenAI();

// Create documents
const documents = [
  new Document({ text: "LlamaIndex is a data framework for LLMs" }),
  new Document({ text: "RAG combines retrieval with generation" }),
];

// Create index with embedding model
const index = await VectorStoreIndex.fromDocuments(documents, {
  embedModel,
});

// Query using embeddings
const queryEngine = index.asQueryEngine({ llm });
const response = await queryEngine.query({ query: "What is RAG?" });

Embedding Nodes

Embed document nodes for indexing:

import { BaseNode, Document } from "llamaindex";
import { OpenAIEmbedding } from "@llamaindex/openai";

const embedModel = new OpenAIEmbedding();

// Documents get converted to nodes
const documents = [
  new Document({ text: "Content to embed" }),
];

// Embed nodes
const nodes = await embedModel.transform(documents);
console.log(nodes[0].embedding); // Embedding vector attached to node

Examples

Semantic Search

import { OpenAIEmbedding } from "@llamaindex/openai";

const embedModel = new OpenAIEmbedding();

// Index documents
const documents = [
  "Python is a programming language",
  "JavaScript is used for web development",
  "Cats are popular pets",
  "Dogs are loyal companions",
];

const docEmbeddings = await embedModel.getTextEmbeddings(documents);

// Search query
const queryEmbedding = await embedModel.getTextEmbedding(
  "Tell me about pet animals"
);

// Find most similar
const similarities = docEmbeddings.map((docEmb) =>
  embedModel.similarity(queryEmbedding, docEmb)
);

const topIndex = similarities.indexOf(Math.max(...similarities));
console.log("Most relevant:", documents[topIndex]);
// Output: "Cats are popular pets" or "Dogs are loyal companions"

Progress Tracking

const largeDataset = [...]; // 10,000 texts

const embeddings = await embedModel.getTextEmbeddingsBatch(largeDataset, {
  logProgress: true,
  progressCallback: (current, total) => {
    const percent = ((current / total) * 100).toFixed(1);
    console.log(`Embedding progress: ${percent}%`);
  },
});

Custom Dimensions (OpenAI)

// Reduce dimensions for faster search and lower storage
const embedModel = new OpenAIEmbedding({
  model: "text-embedding-3-large",
  dimensions: 256, // Instead of default 3072
});

const embedding = await embedModel.getTextEmbedding("test");
console.log(embedding.length); // 256

Best Practices

Choose the right model

OpenAI text-embedding-3-small: Best balance of quality and cost
OpenAI text-embedding-3-large: Highest quality, more expensive
Voyage AI: Excellent for domain-specific tasks
Local (Ollama): Privacy-focused, no API costs

Batch for efficiency

Always use batch methods for multiple texts:

// Good: Single API call
const embeddings = await embedModel.getTextEmbeddings(texts);

// Bad: Multiple API calls
const embeddings = await Promise.all(
  texts.map(t => embedModel.getTextEmbedding(t))
);

Cache embeddings

Embeddings are deterministic - cache them to avoid re-computing:

const cache = new Map();

async function getEmbeddingCached(text: string) {
  if (cache.has(text)) return cache.get(text);
  const embedding = await embedModel.getTextEmbedding(text);
  cache.set(text, embedding);
  return embedding;
}

Handle rate limits

Use batch size and retries for large datasets:

const embedModel = new OpenAIEmbedding({
  embedBatchSize: 100,
  maxRetries: 3,
  timeout: 60000,
});

Normalize input text

Clean and normalize text before embedding:

function normalizeText(text: string) {
  return text
    .toLowerCase()
    .replace(/\s+/g, ' ')
    .trim();
}

const embedding = await embedModel.getTextEmbedding(
  normalizeText(userInput)
);

Some providers support multi-modal embeddings:

CLIP (Images + Text)

import { ClipEmbedding } from "@llamaindex/clip";

const embedModel = new ClipEmbedding();

// Embed images and text in same space
const imageEmbedding = await embedModel.getImageEmbedding(imageBuffer);
const textEmbedding = await embedModel.getTextEmbedding("a photo of a cat");

// Compare image-text similarity
const similarity = embedModel.similarity(imageEmbedding, textEmbedding);

Getting Started

Core Concepts

Building with LlamaIndex

Data Management

Models & Embeddings

Retrievers & Indices

Advanced Features

Overview

BaseEmbedding Interface

Embedding Info

Generating Embeddings

Single Text Embedding

Multiple Text Embeddings

Query Embeddings

Batch Processing

Similarity Calculation

Supported Embedding Models

OpenAI

Google Gemini

Voyage AI

HuggingFace

Ollama (Local)

Cohere

Jina AI

Mixedbread

Using Embeddings with Vector Stores

Embedding Nodes

Examples

Semantic Search

Progress Tracking

Custom Dimensions (OpenAI)

Best Practices

CLIP (Images + Text)

Next Steps

LLMs

Providers

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Building with LlamaIndex

Data Management

Models & Embeddings

Retrievers & Indices

Advanced Features

​Overview

​BaseEmbedding Interface

​Embedding Info

​Generating Embeddings

​Single Text Embedding

​Multiple Text Embeddings

​Query Embeddings

​Batch Processing

​Similarity Calculation

​Supported Embedding Models

​OpenAI

​Google Gemini

​Voyage AI

​HuggingFace

​Ollama (Local)

​Cohere

​Jina AI

​Mixedbread

​Using Embeddings with Vector Stores

​Embedding Nodes

​Examples

​Semantic Search

​Progress Tracking

​Custom Dimensions (OpenAI)

​Best Practices

​Multi-Modal Embeddings

​CLIP (Images + Text)

​Next Steps

LLMs

Providers

Build docs developers (and LLMs) love

Overview

BaseEmbedding Interface

Embedding Info

Generating Embeddings

Single Text Embedding

Multiple Text Embeddings

Query Embeddings

Batch Processing

Similarity Calculation

Supported Embedding Models

OpenAI

Google Gemini

Voyage AI

HuggingFace

Ollama (Local)

Cohere

Jina AI

Mixedbread

Using Embeddings with Vector Stores

Embedding Nodes

Examples

Semantic Search

Progress Tracking

Custom Dimensions (OpenAI)

Best Practices

Multi-Modal Embeddings

CLIP (Images + Text)

Next Steps