Skip to main content
Creates a multimodal embedding model instance that generates embeddings for text and image inputs using Voyage AI’s multimodal models. This model supports text-only, image-only, or combined text and image inputs.
import { voyage } from 'voyage-ai-provider';

const model = voyage.multimodalEmbeddingModel('voyage-multimodal-3');

Parameters

modelId
VoyageMultimodalEmbeddingModelId
required
The identifier of the multimodal embedding model to use.Available models:
  • voyage-multimodal-3 - Third generation multimodal model

Returns

EmbeddingModelV3
object
A multimodal embedding model instance that implements the AI SDK’s EmbeddingModelV3 interface.
modelId
string
The model identifier passed during creation
provider
string
The provider identifier: "voyage.multimodal.embedding"
maxEmbeddingsPerCall
number
Maximum number of inputs per API call: 128
supportsParallelCalls
boolean
Whether parallel calls are supported: false

Input types

The multimodal embedding model accepts MultimodalEmbeddingInput which can be:
  • Single text: string - A single text string
  • Single image: string - A single image URL or base64-encoded image
  • Multiple texts: string[] - Array of texts combined into one embedding
  • Multiple images: string[] - Array of images combined into one embedding
  • Multimodal content: { text?: string[], image?: string[] } - Mixed content with explicit structure
  • Object format (text): { text: string | string[] } - Alternative format for text
  • Object format (image): { image: string | string[] } - Alternative format for images
  • Pre-formatted content: { content: ContentItem[] } - Pre-formatted content items

Image formats

Images can be provided as:
  • URL: https://example.com/image.jpg (must have image extension: .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg)
  • Base64: data:image/jpeg;base64,/9j/4AAQSkZJRg... (data URI with base64 encoding)

Content item structure

When using pre-formatted content, each item follows the structure:
  • { type: 'text', text: string }
  • { type: 'image_url', image_url: string }
  • { type: 'image_base64', image_base64: string }

Usage examples

Generate text and image embedding

Combine text and images into a single embedding vector.
import { voyage } from 'voyage-ai-provider';
import { embed } from 'ai';

const { embedding } = await embed({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  value: {
    text: ['A beautiful sunset over the beach'],
    image: ['https://i.ibb.co/r5w8hG8/beach2.jpg'],
  },
});

console.log(`Embedding length: ${embedding.length}`);

Generate multiple multimodal embeddings

Embed multiple multimodal inputs to generate separate embedding vectors.
import { voyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const { embeddings } = await embedMany({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  values: [
    {
      text: ['Golden sunset over ocean waves on sandy beach.'],
      image: ['https://i.ibb.co/nQNGqL0/beach1.jpg'],
    },
    {
      text: ['Vibrant sunset over tropical beach and ocean.'],
      image: ['https://i.ibb.co/r5w8hG8/beach2.jpg'],
    },
  ],
});

for (const [index, embedding] of embeddings.entries()) {
  console.log(`Embedding ${index}: length ${embedding.length}`);
}

Use text only

The multimodal model can also be used with text-only inputs.
import { voyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const { embeddings } = await embedMany({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  values: [
    'Customer service inquiry about product return',
    'Technical support request for software installation',
    'Sales question about pricing and availability',
  ],
});

Combine multiple texts in one embedding

You can combine multiple text strings into a single embedding.
import { voyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const { embeddings } = await embedMany({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  values: [
    // Product: title + description + features
    [
      'Premium Wireless Bluetooth Headphones',
      'Experience superior sound quality with active noise cancellation',
      'Battery life: 30 hours, Quick charge: 15 min = 3 hours playback',
      'Compatible with iOS, Android, and all Bluetooth devices',
    ],
    // Blog post: title + summary + tags
    [
      'The Future of Artificial Intelligence in Healthcare',
      'Exploring how AI is revolutionizing medical diagnosis and treatment',
      'Tags: AI, healthcare, machine learning, medical technology, innovation',
    ],
  ],
});

Use image only

The multimodal model can also be used with image-only inputs.
import { voyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const { embeddings } = await embedMany({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  values: [
    'https://i.ibb.co/nQNGqL0/beach1.jpg',
    'https://i.ibb.co/r5w8hG8/beach2.jpg',
  ],
});

Combine single text with multiple images

You can combine one text string with multiple images in a single embedding.
import { voyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const { embeddings } = await embedMany({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  values: [
    {
      text: ['A beautiful sunset over the beach'],
      image: [
        'https://i.ibb.co/nQNGqL0/beach1.jpg',
        'https://i.ibb.co/r5w8hG8/beach2.jpg',
      ],
    },
  ],
});

Use object format for text

You can use the object format with explicit text property.
import { voyage } from 'voyage-ai-provider';
import { embed } from 'ai';

const { embedding } = await embed({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  value: {
    text: 'A beautiful sunset over the beach',
  },
});

Use object format for images

You can use the object format with explicit image property.
import { voyage } from 'voyage-ai-provider';
import { embed } from 'ai';

const { embedding } = await embed({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  value: {
    image: 'https://i.ibb.co/nQNGqL0/beach1.jpg',
  },
});

Use pre-formatted content items

You can use pre-formatted content items with explicit type specifications.
import { voyage } from 'voyage-ai-provider';
import { embed } from 'ai';

const { embedding } = await embed({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  value: {
    content: [
      { type: 'text', text: 'A beautiful sunset over the beach' },
      { type: 'image_url', image_url: 'https://i.ibb.co/r5w8hG8/beach2.jpg' },
    ],
  },
});

Use provider options

You can customize the embedding behavior using provider-specific options.
import { voyage } from 'voyage-ai-provider';
import { embed } from 'ai';
import type { VoyageMultimodalEmbeddingOptions } from 'voyage-ai-provider';

const { embedding } = await embed({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  value: {
    text: ['A beautiful sunset over the beach'],
    image: ['https://i.ibb.co/r5w8hG8/beach2.jpg'],
  },
  providerOptions: {
    voyage: {
      inputType: 'document',
      truncation: true,
    } satisfies VoyageMultimodalEmbeddingOptions,
  },
});

Provider options

You can pass Voyage-specific options through the providerOptions parameter:
providerOptions.voyage.inputType
'query' | 'document'
The input type for the embeddings. Defaults to "query".When specified, Voyage automatically prepends a prompt to your inputs before vectorizing them, creating vectors more tailored for retrieval/search tasks.
  • query: Prepends “Represent the query for retrieving supporting documents: ”
  • document: Prepends “Represent the document for retrieval: ”
For retrieval/search purposes where a query is used to search through documents, we recommend specifying whether your inputs are queries or documents. Since inputs can be multimodal, “queries” and “documents” can be text, images, or an interleaving of both modalities.
providerOptions.voyage.outputEncoding
'base64'
The data type for the resulting output embeddings.If not specified (defaults to null), the embeddings are represented as a list of floating-point numbers.If 'base64', the embeddings are represented as a Base64-encoded NumPy array of single-precision floats.
providerOptions.voyage.truncation
boolean
Whether to truncate the input to fit within the context length. Defaults to true.

Build docs developers (and LLMs) love