Semantic retrieval uses vector embeddings to find relevant document chunks based on meaning rather than keywords. Mastra provides tools for vector search, reranking, and filtering.
The createVectorQueryTool function creates a tool that agents can use to search your knowledge base:
import { createVectorQueryTool } from '@mastra/rag/tools' ;
import { PgVector } from '@mastra/vector-pg' ;
import { openai } from '@ai-sdk/openai' ;
const vectorStore = new PgVector ({
connectionString: process . env . DATABASE_URL
});
const embedder = openai . embedding ( 'text-embedding-3-small' );
const ragTool = createVectorQueryTool ({
id: 'searchDocs' ,
indexName: 'documentation' ,
vectorStore ,
model: embedder ,
description: 'Search documentation for relevant information' ,
includeSources: true ,
includeVectors: false
});
Configuration Options
Unique identifier for the tool. Defaults to “VectorQuery Tool”
Name of the vector index to query
Vector database instance (PgVector, Pinecone, Qdrant, etc.)
model
MastraEmbeddingModel
required
Embedding model for converting queries to vectors
Tool description shown to the agent. Should explain what knowledge base is being searched.
Enable metadata filtering. When true, agents can filter results by metadata fields.
Include full source objects in results. Set to false to return only metadata.
Include vector embeddings in results. Usually not needed.
Optional reranking configuration to improve result relevance
Database-specific configuration for vector queries
Provider-specific options for embedding generation
Using with Agents
Add the RAG tool to an agent:
import { Agent } from '@mastra/core' ;
import { createVectorQueryTool } from '@mastra/rag/tools' ;
const ragTool = createVectorQueryTool ({
indexName: 'docs' ,
vectorStore ,
model: embedder ,
description: 'Search product documentation'
});
const agent = new Agent ({
name: 'DocAgent' ,
model: 'openai/gpt-4o' ,
instructions: 'You are a helpful assistant that answers questions about our product. Use the searchDocs tool to find relevant information.' ,
tools: { searchDocs: ragTool }
});
const result = await agent . generate (
'How do I configure authentication?'
);
The agent will automatically:
Determine when to search the knowledge base
Generate a semantic search query
Retrieve relevant chunks
Use the context to answer the question
Reranking Results
Reranking improves result quality by scoring retrieved chunks with a specialized model:
Cohere Reranker
import { CohereRelevanceScorer } from '@mastra/rag/relevance' ;
const ragTool = createVectorQueryTool ({
indexName: 'docs' ,
vectorStore ,
model: embedder ,
reranker: {
model: new CohereRelevanceScorer ({
apiKey: process . env . COHERE_API_KEY
}),
options: {
topK: 5 // Return top 5 after reranking
}
}
});
Mastra Agent Reranker
Use an LLM to rerank results:
import { MastraAgentRelevanceScorer } from '@mastra/rag/relevance' ;
import { openai } from '@ai-sdk/openai' ;
const ragTool = createVectorQueryTool ({
indexName: 'docs' ,
vectorStore ,
model: embedder ,
reranker: {
model: new MastraAgentRelevanceScorer ({
model: openai ( 'gpt-4o-mini' )
}),
options: {
topK: 3
}
}
});
ZeroEntropy Reranker
Open-source reranking:
import { ZeroEntropyRelevanceScorer } from '@mastra/rag/relevance' ;
const ragTool = createVectorQueryTool ({
indexName: 'docs' ,
vectorStore ,
model: embedder ,
reranker: {
model: new ZeroEntropyRelevanceScorer ({
apiKey: process . env . ZEROENTROPY_API_KEY
}),
options: {
topK: 5
}
}
});
Filter results by metadata during retrieval:
// Enable filtering
const ragTool = createVectorQueryTool ({
indexName: 'docs' ,
vectorStore ,
model: embedder ,
enableFilter: true ,
description: 'Search documentation. You can filter by category, version, and language.'
});
// Agent can now filter
const result = await agent . generate (
'Find API documentation for version 2.0' ,
{ tools: { searchDocs: ragTool } }
);
The tool automatically parses filter metadata from the agent’s query.
Manual Filtering
Pass filters programmatically:
import { vectorQuerySearch } from '@mastra/rag/utils' ;
const { results } = await vectorQuerySearch ({
indexName: 'docs' ,
vectorStore ,
queryText: 'authentication setup' ,
model: embedder ,
topK: 10 ,
queryFilter: {
category: 'api' ,
version: '2.0' ,
language: 'typescript'
}
});
Direct Vector Search
Perform vector search without using tools:
import { vectorQuerySearch } from '@mastra/rag/utils' ;
const { results } = await vectorQuerySearch ({
indexName: 'docs' ,
vectorStore ,
queryText: 'How to configure database connections?' ,
model: embedder ,
topK: 5 ,
includeVectors: false
});
results . forEach ( result => {
console . log ( `Score: ${ result . score } ` );
console . log ( `Content: ${ result . metadata . text } ` );
console . log ( `Source: ${ result . metadata . source } ` );
});
Result Structure
Vector search returns structured results:
type VectorQueryResult = {
relevantContext : Array < Record < string , any >>;
sources : Array <{
id : string ;
score : number ;
metadata : Record < string , any >;
vector ?: number [];
}>;
};
Reranking Implementation
Rerank results manually:
import { rerank , rerankWithScorer } from '@mastra/rag' ;
import { CohereRelevanceScorer } from '@mastra/rag/relevance' ;
// Get initial results
const { results } = await vectorQuerySearch ({
indexName: 'docs' ,
vectorStore ,
queryText: 'authentication' ,
model: embedder ,
topK: 20 // Get more for reranking
});
// Rerank with Cohere
const scorer = new CohereRelevanceScorer ({
apiKey: process . env . COHERE_API_KEY
});
const reranked = await rerankWithScorer ({
results ,
query: 'authentication' ,
scorer ,
options: {
topK: 5 // Return top 5
}
});
reranked . forEach ( item => {
console . log ( `Score: ${ item . score } ` );
console . log ( `Content: ${ item . result . metadata . text } ` );
});
Multi-Index Search
Search across multiple indexes:
const docsTool = createVectorQueryTool ({
id: 'searchDocs' ,
indexName: 'documentation' ,
vectorStore ,
model: embedder ,
description: 'Search general documentation'
});
const apiTool = createVectorQueryTool ({
id: 'searchAPI' ,
indexName: 'api-reference' ,
vectorStore ,
model: embedder ,
description: 'Search API reference and endpoint documentation'
});
const codeTool = createVectorQueryTool ({
id: 'searchCode' ,
indexName: 'code-examples' ,
vectorStore ,
model: embedder ,
description: 'Search code examples and snippets'
});
const agent = new Agent ({
name: 'Assistant' ,
model: 'openai/gpt-4o' ,
tools: { searchDocs: docsTool , searchAPI: apiTool , searchCode: codeTool }
});
Hybrid Search
Combine vector search with keyword search:
import { vectorQuerySearch } from '@mastra/rag/utils' ;
async function hybridSearch ( query : string ) {
// Vector search
const vectorResults = await vectorQuerySearch ({
indexName: 'docs' ,
vectorStore ,
queryText: query ,
model: embedder ,
topK: 10
});
// Keyword search (implementation depends on your storage)
const keywordResults = await keywordSearch ( query );
// Merge and deduplicate
const combined = [ ... vectorResults . results , ... keywordResults ];
const unique = Array . from (
new Map ( combined . map ( r => [ r . id , r ])). values ()
);
// Sort by score
return unique . sort (( a , b ) => b . score - a . score ). slice ( 0 , 10 );
}
Advanced Configuration
Configure retrieval behavior:
const ragTool = createVectorQueryTool ({
indexName: 'docs' ,
vectorStore ,
model: embedder ,
// Embedding options
providerOptions: {
openai: {
dimensions: 1536
}
},
// Database-specific config
databaseConfig: {
schema: 'public' ,
table: 'embeddings'
},
// Reranker
reranker: {
model: new CohereRelevanceScorer ({ apiKey: process . env . COHERE_API_KEY }),
options: {
topK: 5 ,
model: 'rerank-english-v3.0'
}
},
// Filtering
enableFilter: true ,
// Output options
includeSources: true ,
includeVectors: false
});
Tune TopK Start with 10-20 results, then rerank to 3-5. Balance between recall and latency.
Use Reranking Reranking significantly improves top-K quality for complex queries.
Cache Embeddings Cache query embeddings for frequently asked questions to reduce API calls.
Filter Early Use metadata filters to reduce search space and improve performance.
Troubleshooting
Verify embeddings were created during ingestion
Check index name matches between ingestion and retrieval
Ensure embedding model dimensions match vector store
Try lowering similarity threshold or increasing topK
Add reranking to improve relevance
Increase topK before reranking (e.g., retrieve 20, rerank to 5)
Review chunking strategy - chunks may be too small or large
Add metadata extraction for better context
Use vector store with HNSW indexes (PostgreSQL)
Reduce topK to retrieve fewer results
Add metadata filters to narrow search space
Consider caching for common queries
Best Practices
Start with high topK, rerank to low topK : Retrieve 20-30 results, rerank to 3-5 for best quality
Use descriptive tool descriptions : Help agents understand when to use each tool
Add metadata filtering : Enable filtering for large knowledge bases
Monitor and log : Track queries, results, and agent decisions for optimization
Test with real queries : Evaluate retrieval quality with actual user questions
Next Steps
Chunking Optimize document chunking strategies
Ingestion Learn about document processing
Memory Combine RAG with conversation memory