Skip to main content
Text embeddings convert text into dense vector representations that capture semantic meaning. You can use Voyage AI’s specialized text embedding models to power semantic search, recommendations, clustering, and other AI applications.

Available models

Voyage AI offers multiple text embedding models optimized for different use cases:
  • voyage-3.5 - Latest flagship model with enhanced performance
  • voyage-3.5-lite - Efficient version with faster inference
  • voyage-3-large - Large model with 2048 default dimensions
  • voyage-3 - General purpose model
  • voyage-3-lite - Lightweight variant for speed
  • voyage-code-3 - Specialized for code and technical content
  • voyage-finance-2 - Optimized for financial documents
  • voyage-multilingual-2 - Supports multiple languages
  • voyage-law-2 - Specialized for legal content

Basic usage

Generate embeddings for a single text input using the embed function:
import { createVoyage } from 'voyage-ai-provider';
import { embed } from 'ai';

const voyage = createVoyage({
  apiKey: process.env.VOYAGE_API_KEY,
});

const { embedding } = await embed({
  model: voyage.textEmbeddingModel('voyage-3-lite'),
  value: 'The quick brown fox jumps over the lazy dog',
});

console.log(embedding);

Batch processing

Generate embeddings for multiple texts efficiently using embedMany:
import { createVoyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const voyage = createVoyage({
  apiKey: process.env.VOYAGE_API_KEY,
});

const { embeddings } = await embedMany({
  model: voyage.textEmbeddingModel('voyage-3-lite'),
  values: [
    'The quick brown fox jumps over the lazy dog',
    'Artificial intelligence is transforming the world',
    'Machine learning enables computers to learn without being explicitly programmed',
  ],
});

for (const [index, embedding] of embeddings.entries()) {
  console.log(`Text ${index + 1}: ${embedding.length} dimensions`);
}

Input types

Voyage AI uses different prompts for queries and documents to optimize retrieval performance.
1
Query embeddings
2
Use inputType: 'query' when embedding search queries:
3
import { createVoyage } from 'voyage-ai-provider';
import { embed } from 'ai';
import type { VoyageEmbeddingOptions } from 'voyage-ai-provider';

const voyage = createVoyage({
  apiKey: process.env.VOYAGE_API_KEY,
});

const { embedding } = await embed({
  model: voyage.textEmbeddingModel('voyage-3-lite'),
  value: 'How do I reset my password?',
  providerOptions: {
    voyage: {
      inputType: 'query',
    } satisfies VoyageEmbeddingOptions,
  },
});
4
The model prepends “Represent the query for retrieving supporting documents: ” to query inputs.
5
Document embeddings
6
Use inputType: 'document' when embedding documents for retrieval:
7
import { createVoyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';
import type { VoyageEmbeddingOptions } from 'voyage-ai-provider';

const voyage = createVoyage({
  apiKey: process.env.VOYAGE_API_KEY,
});

const { embeddings } = await embedMany({
  model: voyage.textEmbeddingModel('voyage-3-lite'),
  values: [
    'To reset your password, click the forgot password link on the login page.',
    'Our support team is available 24/7 to help with your questions.',
    'Premium users get access to priority support and advanced features.',
  ],
  providerOptions: {
    voyage: {
      inputType: 'document',
    } satisfies VoyageEmbeddingOptions,
  },
});
8
The model prepends “Represent the document for retrieval: ” to document inputs.

Grouped text embeddings

You can combine multiple related text segments into a single embedding. This is useful for representing complex documents with titles, descriptions, and metadata:
import { createVoyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const voyage = createVoyage({
  apiKey: process.env.VOYAGE_API_KEY,
});

// Use multimodal model for grouped text
const { embeddings } = await embedMany({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  values: [
    // E-commerce product: title + description + features
    [
      'Premium Wireless Bluetooth Headphones',
      'Experience superior sound quality with active noise cancellation',
      'Battery life: 30 hours, Quick charge: 15 min = 3 hours playback',
      'Compatible with iOS, Android, and all Bluetooth devices',
    ],
    // Blog post: title + summary + tags
    [
      'The Future of Artificial Intelligence in Healthcare',
      'Exploring how AI is revolutionizing medical diagnosis and treatment',
      'Tags: AI, healthcare, machine learning, medical technology, innovation',
    ],
    // Job listing: title + company + description
    [
      'Senior Software Engineer - Full Stack',
      'TechCorp Inc. - Leading technology company',
      'Build scalable web applications using React, Node.js, and cloud technologies',
      'Requirements: 5+ years experience, strong problem-solving skills',
    ],
  ],
});
Grouping related text segments creates richer semantic representations than embedding them separately.

Specialized models

Code embeddings

Use voyage-code-3 for code snippets and technical documentation:
import { createVoyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const voyage = createVoyage({
  apiKey: process.env.VOYAGE_API_KEY,
});

const { embeddings } = await embedMany({
  model: voyage.textEmbeddingModel('voyage-code-3'),
  values: [
    'function calculateTotal(items) { return items.reduce((sum, item) => sum + item.price, 0); }',
    'const express = require("express"); const app = express(); app.listen(3000);',
    'class UserAuthentication { constructor(database) { this.db = database; } }',
  ],
});

Multilingual embeddings

Use voyage-multilingual-2 for content in multiple languages:
import { createVoyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const voyage = createVoyage({
  apiKey: process.env.VOYAGE_API_KEY,
});

const { embeddings } = await embedMany({
  model: voyage.textEmbeddingModel('voyage-multilingual-2'),
  values: [
    'Hello, how are you?',
    'Bonjour, comment allez-vous?',
    'Hola, ¿cómo estás?',
    '你好,你好吗?',
  ],
});

Domain-specific models

Use voyage-finance-2 for financial documents:
import { createVoyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const voyage = createVoyage({
  apiKey: process.env.VOYAGE_API_KEY,
});

const { embeddings } = await embedMany({
  model: voyage.textEmbeddingModel('voyage-finance-2'),
  values: [
    'Q3 earnings exceeded analyst expectations with revenue growth of 15%',
    'The Federal Reserve announced a 25 basis point interest rate increase',
    'Portfolio diversification reduces risk through asset allocation',
  ],
});

Configuration options

Customize embedding behavior with provider options:
import { createVoyage } from 'voyage-ai-provider';
import { embed } from 'ai';
import type { VoyageEmbeddingOptions } from 'voyage-ai-provider';

const voyage = createVoyage({
  apiKey: process.env.VOYAGE_API_KEY,
});

const { embedding } = await embed({
  model: voyage.textEmbeddingModel('voyage-code-3'),
  value: 'Sample text for embedding',
  providerOptions: {
    voyage: {
      inputType: 'query',
      outputDimension: 512,
      outputDtype: 'float',
      truncation: true,
    } satisfies VoyageEmbeddingOptions,
  },
});

Available options

The input type for embeddings. Defaults to "query".
  • query - Prepends “Represent the query for retrieving supporting documents: ”
  • document - Prepends “Represent the document for retrieval: ”
The number of dimensions for output embeddings. If not specified, uses the model’s default dimension.
  • voyage-code-3 supports: 2048, 1024 (default), 512, and 256
  • voyage-3-large supports: 2048, 1024 (default), 512, and 256
Refer to the model documentation for supported values.
The data type for output embeddings. Defaults to "float".
  • float - 32-bit single-precision floating-point numbers (supported by all models)
  • int8 - 8-bit integers from -128 to 127 (supported by voyage-code-3)
  • uint8 - 8-bit integers from 0 to 255 (supported by voyage-code-3)
  • binary - Bit-packed quantized values using int8 (supported by voyage-code-3)
  • ubinary - Bit-packed quantized values using uint8 (supported by voyage-code-3)
See quantization FAQ for details.
Whether to truncate input texts to fit within the context length. Defaults to false.Set to true to automatically truncate long texts instead of raising an error.

Working with usage data

The embedding response includes token usage information:
import { createVoyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const voyage = createVoyage({
  apiKey: process.env.VOYAGE_API_KEY,
});

const result = await embedMany({
  model: voyage.textEmbeddingModel('voyage-3-lite'),
  values: [
    'First text to embed',
    'Second text to embed',
    'Third text to embed',
  ],
});

console.log(`Generated ${result.embeddings.length} embeddings`);
console.log(`Tokens used: ${result.usage?.tokens}`);

Error handling

Handle errors gracefully when generating embeddings:
import { createVoyage } from 'voyage-ai-provider';
import { embed } from 'ai';

const voyage = createVoyage({
  apiKey: process.env.VOYAGE_API_KEY,
});

try {
  const { embedding } = await embed({
    model: voyage.textEmbeddingModel('voyage-3-lite'),
    value: 'Text to embed',
  });
  
  console.log('Embedding generated successfully');
} catch (error) {
  console.error('Failed to generate embedding:', error);
}
The maximum batch size is 128 embeddings per call. Split larger batches into multiple requests.

Best practices

1
Choose the right model
2
Select models based on your use case:
3
  • General text: voyage-3.5 or voyage-3-lite
  • Code: voyage-code-3
  • Multilingual: voyage-multilingual-2
  • Domain-specific: voyage-finance-2 or voyage-law-2
  • 4
    Use appropriate input types
    5
    Set inputType correctly for optimal retrieval:
    6
  • Use query for search queries
  • Use document for content being searched
  • 7
    Batch when possible
    8
    Use embedMany instead of multiple embed calls to reduce API overhead and improve performance.
    9
    Consider dimension reduction
    10
    For models that support it, use lower outputDimension values to reduce storage and computation costs while maintaining acceptable accuracy.

    Next steps

    Image embeddings

    Learn how to generate embeddings from images

    Multimodal embeddings

    Combine text and images in a single embedding

    Reranking

    Improve search results with reranking

    Configuration

    Customize provider settings

    Build docs developers (and LLMs) love