Reranking

Reranking improves search results by re-scoring documents based on their relevance to a query. It’s a powerful technique for refining initial search results from embeddings or keyword searches.

What is reranking?

Reranking is a two-stage retrieval process:

Initial retrieval: Use fast methods (embeddings, keywords) to get candidate documents
Reranking: Use a more sophisticated model to score and reorder the candidates

This approach gives you the speed of simple retrieval with the accuracy of advanced models.

Why use reranking?

Reranking models understand the relationship between queries and documents better than embeddings alone:

Better relevance: Cross-attention between query and document
Semantic understanding: Captures nuanced meaning
Improved ranking: More accurate ordering of results
Cost-effective: Only rerank top candidates, not entire corpus

Embeddings encode documents independently, while reranking models analyze the query-document relationship directly.

Basic usage

Use the rerank function from the AI SDK:

import { voyage } from 'voyage-ai-provider';
import { rerank } from 'ai';

const { ranking } = await rerank({
  model: voyage.reranking('rerank-2'),
  query: 'What is machine learning?',
  documents: [
    'Machine learning is a subset of artificial intelligence...',
    'The weather today is sunny and warm.',
    'Deep learning uses neural networks with multiple layers...',
    'I like to eat pizza on weekends.',
  ],
});

console.log(ranking);
// [
//   { index: 0, relevanceScore: 0.95 },
//   { index: 2, relevanceScore: 0.87 },
//   { index: 1, relevanceScore: 0.12 },
//   { index: 3, relevanceScore: 0.08 },
// ]

The results are sorted by relevance score, with the most relevant documents first.

Available models

Voyage provides several reranking models:

rerank-2.5 - Latest model with improved accuracy
rerank-2.5-lite - Faster, efficient version
rerank-2 - Previous generation
rerank-lite-2 - Efficient version of rerank-2
rerank-1 - First generation
rerank-lite-1 - Efficient first generation

Use rerank-2.5 for the best accuracy or rerank-2.5-lite for a balance between speed and quality.

Limiting results

Use the topN parameter to get only the most relevant results:

const { ranking } = await rerank({
  model: voyage.reranking('rerank-2'),
  query: 'machine learning algorithms',
  documents: [
    'Linear regression is a supervised learning algorithm...',
    'The sky is blue.',
    'Decision trees are used for classification...',
    'Neural networks can approximate any function...',
    'I enjoy hiking in the mountains.',
  ],
  topN: 3, // Only return top 3 results
});

console.log(ranking.length); // 3

The topN parameter is mapped to the API’s top_k parameter. Both names refer to the same functionality.

Reranking options

The VoyageRerankingOptions type defines additional configuration:

type VoyageRerankingOptions = {
  /**
   * Whether to return the documents in the response. Defaults to false.
   */
  returnDocuments?: boolean;

  /**
   * Whether to truncate inputs to fit context length. Defaults to true.
   */
  truncation?: boolean;
};

Returning documents

By default, the response only includes indices and scores. Enable returnDocuments to include the original text:

const result = await rerank({
  model: voyage.reranking('rerank-2'),
  query: 'AI applications',
  documents: ['AI is used in healthcare', 'Dogs are loyal pets'],
  providerOptions: {
    voyage: {
      returnDocuments: true,
    },
  },
});

// Note: The AI SDK's rerank function doesn't expose documents in the response
// This option affects the API response but not the standardized SDK output

Why doesn't the SDK return documents?

The AI SDK standardizes the response format across all providers. The ranking array always contains { index, relevanceScore } objects.You already have the original documents in your code, so you can access them using the index:

const documents = ['Doc 1', 'Doc 2', 'Doc 3'];
const { ranking } = await rerank({ model, query, documents });

const topDoc = documents[ranking[0].index];

Truncation

Control whether inputs are truncated to fit the model’s context length:

const { ranking } = await rerank({
  model: voyage.reranking('rerank-2'),
  query: 'very long query...',
  documents: ['very long document...'],
  providerOptions: {
    voyage: {
      truncation: false, // Throw error if inputs are too long
    },
  },
});

Context length limits:

rerank-2.5 / rerank-2.5-lite: 8,000 tokens (query), 32,000 tokens (query + document)
rerank-2: 4,000 tokens (query), 16,000 tokens (query + document)
rerank-2-lite / rerank-1: 2,000 tokens (query), 8,000 tokens (query + document)
rerank-lite-1: 1,000 tokens (query), 4,000 tokens (query + document)

Understanding the response

The reranking response includes:

interface RerankingResponse {
  ranking: Array<{
    index: number;        // Index in the original documents array
    relevanceScore: number; // Score between 0 and 1
  }>;
  warnings?: SharedV3Warning[];
  response: {
    headers: Record<string, string>;
    body: unknown;
  };
}

Relevance scores

Scores are normalized between 0 and 1:

0.8 - 1.0: Highly relevant
0.5 - 0.8: Moderately relevant
0.0 - 0.5: Less relevant

const { ranking } = await rerank({
  model: voyage.reranking('rerank-2'),
  query: 'deep learning',
  documents: [
    'Deep learning is a type of machine learning',
    'Pizza is delicious',
  ],
});

if (ranking[0].relevanceScore > 0.7) {
  console.log('Highly relevant result found!');
}

Model implementation

The VoyageRerankingModel class implements the RerankingModelV3 interface:

class VoyageRerankingModel implements RerankingModelV3 {
  readonly specificationVersion = 'v3';
  readonly modelId: VoyageRerankingModelId;
  readonly provider: string;

  async doRerank({
    documents,
    query,
    topN,
    headers,
    abortSignal,
    providerOptions,
  }: RerankingParams): Promise<RerankingResponse> {
    // Implementation details...
  }
}

Document types

The reranking API handles both text and structured documents:

// Text documents
const result1 = await rerank({
  model: voyage.reranking('rerank-2'),
  query: 'AI',
  documents: ['Text 1', 'Text 2'],
});

// Structured documents (automatically stringified)
const result2 = await rerank({
  model: voyage.reranking('rerank-2'),
  query: 'AI',
  documents: [
    { title: 'Article 1', content: 'AI is...' },
    { title: 'Article 2', content: 'ML is...' },
  ],
});

Structured documents are converted to JSON strings before being sent to the API. For best results, use plain text or format your objects as readable strings.

Complete example: Semantic search with reranking

Combine embeddings and reranking for optimal search:

import { voyage } from 'voyage-ai-provider';
import { embed, rerank } from 'ai';

// 1. Your document corpus
const documents = [
  'Machine learning is a subset of AI that focuses on learning from data.',
  'Deep learning uses neural networks with many layers.',
  'Artificial intelligence aims to create intelligent machines.',
  'The weather forecast predicts rain tomorrow.',
  'Neural networks are inspired by biological neurons.',
  'I prefer coffee over tea in the morning.',
];

// 2. Embed all documents
const { embeddings: docEmbeddings } = await embedMany({
  model: voyage('voyage-3'),
  values: documents,
  providerOptions: { voyage: { inputType: 'document' } },
});

// 3. User query
const query = 'What is deep learning?';

// 4. Embed the query
const { embedding: queryEmbedding } = await embed({
  model: voyage('voyage-3'),
  value: query,
  providerOptions: { voyage: { inputType: 'query' } },
});

// 5. Find top candidates with cosine similarity
function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dotProduct / (magA * magB);
}

const similarities = docEmbeddings.map((emb, idx) => ({
  index: idx,
  score: cosineSimilarity(queryEmbedding, emb),
}));

// Get top 5 candidates
const topCandidates = similarities
  .sort((a, b) => b.score - a.score)
  .slice(0, 5);

const candidateDocuments = topCandidates.map((c) => documents[c.index]);

// 6. Rerank the candidates
const { ranking } = await rerank({
  model: voyage.reranking('rerank-2'),
  query,
  documents: candidateDocuments,
  topN: 3,
});

// 7. Get final results
const finalResults = ranking.map((r) => ({
  document: candidateDocuments[r.index],
  score: r.relevanceScore,
}));

console.log('Top results:', finalResults);

This two-stage approach combines the speed of embedding search with the accuracy of reranking:

Use embeddings to narrow down from thousands/millions of documents to a few dozen
Use reranking to precisely order those candidates

Best practices

Choose the right number of candidates

Rerank 10-100 candidates from your initial retrieval:

// Too few candidates: might miss relevant documents
const candidates = await getTopEmbeddingMatches(query, 5);

// Good: balance between coverage and cost
const candidates = await getTopEmbeddingMatches(query, 20);

// Too many: slower and more expensive
const candidates = await getTopEmbeddingMatches(query, 500);

Set appropriate topN

Only return as many results as you need:

// For a chat interface
const { ranking } = await rerank({
  model: voyage.reranking('rerank-2'),
  query,
  documents: candidates,
  topN: 3, // Just enough for context
});

// For a search results page
const { ranking } = await rerank({
  model: voyage.reranking('rerank-2'),
  query,
  documents: candidates,
  topN: 10, // Full page of results
});

Handle edge cases

const { ranking } = await rerank({
  model: voyage.reranking('rerank-2'),
  query,
  documents: candidates,
  topN: 5,
});

// Filter by minimum relevance
const relevantResults = ranking.filter((r) => r.relevanceScore > 0.5);

if (relevantResults.length === 0) {
  console.log('No relevant results found');
}

Get Started

Core Concepts

Guides

Models

What is reranking?

Why use reranking?

Basic usage

Available models

Limiting results

Reranking options

Returning documents

Truncation

Understanding the response

Relevance scores

Model implementation

Document types

Complete example: Semantic search with reranking

Best practices

Choose the right number of candidates

Set appropriate topN

Handle edge cases

Next steps

Embeddings

Models

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Models

​What is reranking?

​Why use reranking?

​Basic usage

​Available models

​Limiting results

​Reranking options

​Returning documents

​Truncation

​Understanding the response

​Relevance scores

​Model implementation

​Document types

​Complete example: Semantic search with reranking

​Best practices

​Choose the right number of candidates

​Set appropriate topN

​Handle edge cases

​Next steps

Embeddings

Models

Build docs developers (and LLMs) love

What is reranking?

Why use reranking?

Basic usage

Available models

Limiting results

Reranking options

Returning documents

Truncation

Understanding the response

Relevance scores

Model implementation

Document types

Complete example: Semantic search with reranking

Best practices

Choose the right number of candidates

Set appropriate topN

Handle edge cases

Next steps