Available models
| Model | Query token limit | Query + document limit | Model ID |
|---|---|---|---|
| Rerank 2.5 | 8,000 tokens | 32,000 tokens | rerank-2.5 |
| Rerank 2.5 Lite | 8,000 tokens | 32,000 tokens | rerank-2.5-lite |
| Rerank 2 | 4,000 tokens | 16,000 tokens | rerank-2 |
| Rerank Lite 2 | 2,000 tokens | 8,000 tokens | rerank-lite-2 |
| Rerank 1 | 2,000 tokens | 8,000 tokens | rerank-1 |
| Rerank Lite 1 | 1,000 tokens | 4,000 tokens | rerank-lite-1 |
Usage example
You can create a reranking model using thevoyage.reranking() method:
Model settings
You can customize reranking behavior using provider options:Available settings
Whether to include the document text in the response.
- When
false: Returns{index, relevanceScore}for each result - When
true: Returns{index, document, relevanceScore}for each result
true when you need to access the document content without maintaining a separate lookup.Whether to automatically truncate inputs to fit within the context length limits.When
true, queries and documents are truncated to fit within the model’s token limits. When false, an error is raised if inputs exceed the limits.Token limits vary by model:
rerank-2.5andrerank-2.5-lite: Query max 8,000 tokens, combined max 32,000 tokensrerank-2: Query max 4,000 tokens, combined max 16,000 tokensrerank-lite-2andrerank-1: Query max 2,000 tokens, combined max 8,000 tokensrerank-lite-1: Query max 1,000 tokens, combined max 4,000 tokens
Choosing the right model
High-performance applications
- rerank-2.5: Best overall accuracy and supports long contexts up to 32,000 tokens
- rerank-2.5-lite: Faster inference with similar quality to rerank-2.5, ideal for latency-sensitive applications
Standard applications
- rerank-2: Good balance of performance and cost for moderate context lengths
- rerank-lite-2: Lighter model for faster reranking with shorter documents
Legacy models
- rerank-1: Earlier generation model (consider upgrading to rerank-2.5)
- rerank-lite-1: Lightweight legacy model with limited context (consider upgrading to rerank-2.5-lite)
How reranking works
Reranking is typically used as part of a two-stage retrieval pipeline:- Initial retrieval: Use embedding-based search to retrieve candidate documents (e.g., top 100)
- Reranking: Use a reranking model to reorder the candidates and return the most relevant results (e.g., top 10)
Use cases
Semantic search
Improve search result quality by reranking initial retrieval results:Question answering
Find the most relevant context for answering questions:Content recommendation
Rank content items by relevance to user interests:Best practices
Optimize the number of candidates
Reranking is more expensive than embedding similarity. Balance accuracy and cost:- Retrieve 50-200 candidates with embeddings
- Rerank to get the final 5-20 results
Handle truncation appropriately
For critical applications, disable truncation and handle errors explicitly:Choose topN wisely
ThetopN parameter determines how many results to return:
- For user-facing search: 10-20 results
- For RAG context: 3-5 results
- For re-ranking pipeline: 20-50 results for further processing