multimodalEmbeddingModel()

Creates a multimodal embedding model instance that generates embeddings for text and image inputs using Voyage AI’s multimodal models. This model supports text-only, image-only, or combined text and image inputs.

import { voyage } from 'voyage-ai-provider';

const model = voyage.multimodalEmbeddingModel('voyage-multimodal-3');

Parameters

modelId

VoyageMultimodalEmbeddingModelId

required

The identifier of the multimodal embedding model to use.Available models:

voyage-multimodal-3 - Third generation multimodal model

Returns

EmbeddingModelV3

object

A multimodal embedding model instance that implements the AI SDK’s EmbeddingModelV3 interface.

modelId

string

The model identifier passed during creation

provider

string

The provider identifier: "voyage.multimodal.embedding"

maxEmbeddingsPerCall

number

Maximum number of inputs per API call: 128

supportsParallelCalls

boolean

Whether parallel calls are supported: false

Input types

The multimodal embedding model accepts MultimodalEmbeddingInput which can be:

Single text: string - A single text string
Single image: string - A single image URL or base64-encoded image
Multiple texts: string[] - Array of texts combined into one embedding
Multiple images: string[] - Array of images combined into one embedding
Multimodal content: { text?: string[], image?: string[] } - Mixed content with explicit structure
Object format (text): { text: string | string[] } - Alternative format for text
Object format (image): { image: string | string[] } - Alternative format for images
Pre-formatted content: { content: ContentItem[] } - Pre-formatted content items

Image formats

Images can be provided as:

URL: https://example.com/image.jpg (must have image extension: .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg)
Base64: data:image/jpeg;base64,/9j/4AAQSkZJRg... (data URI with base64 encoding)

Content item structure

When using pre-formatted content, each item follows the structure:

{ type: 'text', text: string }
{ type: 'image_url', image_url: string }
{ type: 'image_base64', image_base64: string }

Usage examples

Generate text and image embedding

Combine text and images into a single embedding vector.

import { voyage } from 'voyage-ai-provider';
import { embed } from 'ai';

const { embedding } = await embed({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  value: {
    text: ['A beautiful sunset over the beach'],
    image: ['https://i.ibb.co/r5w8hG8/beach2.jpg'],
  },
});

console.log(`Embedding length: ${embedding.length}`);

Generate multiple multimodal embeddings

Embed multiple multimodal inputs to generate separate embedding vectors.

import { voyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const { embeddings } = await embedMany({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  values: [
    {
      text: ['Golden sunset over ocean waves on sandy beach.'],
      image: ['https://i.ibb.co/nQNGqL0/beach1.jpg'],
    },
    {
      text: ['Vibrant sunset over tropical beach and ocean.'],
      image: ['https://i.ibb.co/r5w8hG8/beach2.jpg'],
    },
  ],
});

for (const [index, embedding] of embeddings.entries()) {
  console.log(`Embedding ${index}: length ${embedding.length}`);
}

Use text only

The multimodal model can also be used with text-only inputs.

import { voyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const { embeddings } = await embedMany({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  values: [
    'Customer service inquiry about product return',
    'Technical support request for software installation',
    'Sales question about pricing and availability',
  ],
});

Combine multiple texts in one embedding

You can combine multiple text strings into a single embedding.

import { voyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const { embeddings } = await embedMany({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  values: [
    // Product: title + description + features
    [
      'Premium Wireless Bluetooth Headphones',
      'Experience superior sound quality with active noise cancellation',
      'Battery life: 30 hours, Quick charge: 15 min = 3 hours playback',
      'Compatible with iOS, Android, and all Bluetooth devices',
    ],
    // Blog post: title + summary + tags
    [
      'The Future of Artificial Intelligence in Healthcare',
      'Exploring how AI is revolutionizing medical diagnosis and treatment',
      'Tags: AI, healthcare, machine learning, medical technology, innovation',
    ],
  ],
});

Use image only

The multimodal model can also be used with image-only inputs.

import { voyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const { embeddings } = await embedMany({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  values: [
    'https://i.ibb.co/nQNGqL0/beach1.jpg',
    'https://i.ibb.co/r5w8hG8/beach2.jpg',
  ],
});

Combine single text with multiple images

You can combine one text string with multiple images in a single embedding.

import { voyage } from 'voyage-ai-provider';
import { embedMany } from 'ai';

const { embeddings } = await embedMany({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  values: [
    {
      text: ['A beautiful sunset over the beach'],
      image: [
        'https://i.ibb.co/nQNGqL0/beach1.jpg',
        'https://i.ibb.co/r5w8hG8/beach2.jpg',
      ],
    },
  ],
});

Use object format for text

You can use the object format with explicit text property.

import { voyage } from 'voyage-ai-provider';
import { embed } from 'ai';

const { embedding } = await embed({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  value: {
    text: 'A beautiful sunset over the beach',
  },
});

Use object format for images

You can use the object format with explicit image property.

import { voyage } from 'voyage-ai-provider';
import { embed } from 'ai';

const { embedding } = await embed({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  value: {
    image: 'https://i.ibb.co/nQNGqL0/beach1.jpg',
  },
});

Use pre-formatted content items

You can use pre-formatted content items with explicit type specifications.

import { voyage } from 'voyage-ai-provider';
import { embed } from 'ai';

const { embedding } = await embed({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  value: {
    content: [
      { type: 'text', text: 'A beautiful sunset over the beach' },
      { type: 'image_url', image_url: 'https://i.ibb.co/r5w8hG8/beach2.jpg' },
    ],
  },
});

Use provider options

You can customize the embedding behavior using provider-specific options.

import { voyage } from 'voyage-ai-provider';
import { embed } from 'ai';
import type { VoyageMultimodalEmbeddingOptions } from 'voyage-ai-provider';

const { embedding } = await embed({
  model: voyage.multimodalEmbeddingModel('voyage-multimodal-3'),
  value: {
    text: ['A beautiful sunset over the beach'],
    image: ['https://i.ibb.co/r5w8hG8/beach2.jpg'],
  },
  providerOptions: {
    voyage: {
      inputType: 'document',
      truncation: true,
    } satisfies VoyageMultimodalEmbeddingOptions,
  },
});

Provider options

You can pass Voyage-specific options through the providerOptions parameter:

providerOptions.voyage.inputType

'query' | 'document'

The input type for the embeddings. Defaults to "query".When specified, Voyage automatically prepends a prompt to your inputs before vectorizing them, creating vectors more tailored for retrieval/search tasks.

query: Prepends “Represent the query for retrieving supporting documents: ”
document: Prepends “Represent the document for retrieval: ”

For retrieval/search purposes where a query is used to search through documents, we recommend specifying whether your inputs are queries or documents. Since inputs can be multimodal, “queries” and “documents” can be text, images, or an interleaving of both modalities.

providerOptions.voyage.outputEncoding

'base64'

The data type for the resulting output embeddings.If not specified (defaults to null), the embeddings are represented as a list of floating-point numbers.If 'base64', the embeddings are represented as a Base64-encoded NumPy array of single-precision floats.

providerOptions.voyage.truncation

boolean

Whether to truncate the input to fit within the context length. Defaults to true.

Provider

Embedding Models

Reranking

Types

Parameters

Returns

Input types

Image formats

Content item structure

Usage examples

Generate text and image embedding

Generate multiple multimodal embeddings

Use text only

Combine multiple texts in one embedding

Use image only

Combine single text with multiple images

Use object format for text

Use object format for images

Use pre-formatted content items

Use provider options

Provider options

Build docs developers (and LLMs) love

Provider

Embedding Models

Reranking

Types

​Parameters

​Returns

​Input types

​Image formats

​Content item structure

​Usage examples

​Generate text and image embedding

​Generate multiple multimodal embeddings

​Use text only

​Combine multiple texts in one embedding

​Use image only

​Combine single text with multiple images

​Use object format for text

​Use object format for images

​Use pre-formatted content items

​Use provider options

​Provider options

Build docs developers (and LLMs) love

Parameters

Returns

Input types

Image formats

Content item structure

Usage examples

Generate text and image embedding

Generate multiple multimodal embeddings

Use text only

Combine multiple texts in one embedding

Use image only

Combine single text with multiple images

Use object format for text

Use object format for images

Use pre-formatted content items

Use provider options

Provider options