Skip to main content
Semantic retrieval uses vector embeddings to find relevant document chunks based on meaning rather than keywords. Mastra provides tools for vector search, reranking, and filtering.

Vector Query Tool

The createVectorQueryTool function creates a tool that agents can use to search your knowledge base:
import { createVectorQueryTool } from '@mastra/rag/tools';
import { PgVector } from '@mastra/vector-pg';
import { openai } from '@ai-sdk/openai';

const vectorStore = new PgVector({
  connectionString: process.env.DATABASE_URL
});

const embedder = openai.embedding('text-embedding-3-small');

const ragTool = createVectorQueryTool({
  id: 'searchDocs',
  indexName: 'documentation',
  vectorStore,
  model: embedder,
  description: 'Search documentation for relevant information',
  includeSources: true,
  includeVectors: false
});

Configuration Options

id
string
Unique identifier for the tool. Defaults to “VectorQuery Tool”
indexName
string
required
Name of the vector index to query
vectorStore
MastraVector
required
Vector database instance (PgVector, Pinecone, Qdrant, etc.)
model
MastraEmbeddingModel
required
Embedding model for converting queries to vectors
description
string
Tool description shown to the agent. Should explain what knowledge base is being searched.
enableFilter
boolean
default:"false"
Enable metadata filtering. When true, agents can filter results by metadata fields.
includeSources
boolean
default:"true"
Include full source objects in results. Set to false to return only metadata.
includeVectors
boolean
default:"false"
Include vector embeddings in results. Usually not needed.
reranker
RerankConfig
Optional reranking configuration to improve result relevance
databaseConfig
DatabaseConfig
Database-specific configuration for vector queries
providerOptions
ProviderOptions
Provider-specific options for embedding generation

Using with Agents

Add the RAG tool to an agent:
import { Agent } from '@mastra/core';
import { createVectorQueryTool } from '@mastra/rag/tools';

const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  description: 'Search product documentation'
});

const agent = new Agent({
  name: 'DocAgent',
  model: 'openai/gpt-4o',
  instructions: 'You are a helpful assistant that answers questions about our product. Use the searchDocs tool to find relevant information.',
  tools: { searchDocs: ragTool }
});

const result = await agent.generate(
  'How do I configure authentication?'
);
The agent will automatically:
  1. Determine when to search the knowledge base
  2. Generate a semantic search query
  3. Retrieve relevant chunks
  4. Use the context to answer the question

Reranking Results

Reranking improves result quality by scoring retrieved chunks with a specialized model:

Cohere Reranker

import { CohereRelevanceScorer } from '@mastra/rag/relevance';

const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  reranker: {
    model: new CohereRelevanceScorer({
      apiKey: process.env.COHERE_API_KEY
    }),
    options: {
      topK: 5 // Return top 5 after reranking
    }
  }
});

Mastra Agent Reranker

Use an LLM to rerank results:
import { MastraAgentRelevanceScorer } from '@mastra/rag/relevance';
import { openai } from '@ai-sdk/openai';

const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  reranker: {
    model: new MastraAgentRelevanceScorer({
      model: openai('gpt-4o-mini')
    }),
    options: {
      topK: 3
    }
  }
});

ZeroEntropy Reranker

Open-source reranking:
import { ZeroEntropyRelevanceScorer } from '@mastra/rag/relevance';

const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  reranker: {
    model: new ZeroEntropyRelevanceScorer({
      apiKey: process.env.ZEROENTROPY_API_KEY
    }),
    options: {
      topK: 5
    }
  }
});

Metadata Filtering

Filter results by metadata during retrieval:
// Enable filtering
const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  enableFilter: true,
  description: 'Search documentation. You can filter by category, version, and language.'
});

// Agent can now filter
const result = await agent.generate(
  'Find API documentation for version 2.0',
  { tools: { searchDocs: ragTool } }
);
The tool automatically parses filter metadata from the agent’s query.

Manual Filtering

Pass filters programmatically:
import { vectorQuerySearch } from '@mastra/rag/utils';

const { results } = await vectorQuerySearch({
  indexName: 'docs',
  vectorStore,
  queryText: 'authentication setup',
  model: embedder,
  topK: 10,
  queryFilter: {
    category: 'api',
    version: '2.0',
    language: 'typescript'
  }
});
Perform vector search without using tools:
import { vectorQuerySearch } from '@mastra/rag/utils';

const { results } = await vectorQuerySearch({
  indexName: 'docs',
  vectorStore,
  queryText: 'How to configure database connections?',
  model: embedder,
  topK: 5,
  includeVectors: false
});

results.forEach(result => {
  console.log(`Score: ${result.score}`);
  console.log(`Content: ${result.metadata.text}`);
  console.log(`Source: ${result.metadata.source}`);
});

Result Structure

Vector search returns structured results:
type VectorQueryResult = {
  relevantContext: Array<Record<string, any>>;
  sources: Array<{
    id: string;
    score: number;
    metadata: Record<string, any>;
    vector?: number[];
  }>;
};

Reranking Implementation

Rerank results manually:
import { rerank, rerankWithScorer } from '@mastra/rag';
import { CohereRelevanceScorer } from '@mastra/rag/relevance';

// Get initial results
const { results } = await vectorQuerySearch({
  indexName: 'docs',
  vectorStore,
  queryText: 'authentication',
  model: embedder,
  topK: 20 // Get more for reranking
});

// Rerank with Cohere
const scorer = new CohereRelevanceScorer({
  apiKey: process.env.COHERE_API_KEY
});

const reranked = await rerankWithScorer({
  results,
  query: 'authentication',
  scorer,
  options: {
    topK: 5 // Return top 5
  }
});

reranked.forEach(item => {
  console.log(`Score: ${item.score}`);
  console.log(`Content: ${item.result.metadata.text}`);
});
Search across multiple indexes:
const docsTool = createVectorQueryTool({
  id: 'searchDocs',
  indexName: 'documentation',
  vectorStore,
  model: embedder,
  description: 'Search general documentation'
});

const apiTool = createVectorQueryTool({
  id: 'searchAPI',
  indexName: 'api-reference',
  vectorStore,
  model: embedder,
  description: 'Search API reference and endpoint documentation'
});

const codeTool = createVectorQueryTool({
  id: 'searchCode',
  indexName: 'code-examples',
  vectorStore,
  model: embedder,
  description: 'Search code examples and snippets'
});

const agent = new Agent({
  name: 'Assistant',
  model: 'openai/gpt-4o',
  tools: { searchDocs: docsTool, searchAPI: apiTool, searchCode: codeTool }
});
Combine vector search with keyword search:
import { vectorQuerySearch } from '@mastra/rag/utils';

async function hybridSearch(query: string) {
  // Vector search
  const vectorResults = await vectorQuerySearch({
    indexName: 'docs',
    vectorStore,
    queryText: query,
    model: embedder,
    topK: 10
  });
  
  // Keyword search (implementation depends on your storage)
  const keywordResults = await keywordSearch(query);
  
  // Merge and deduplicate
  const combined = [...vectorResults.results, ...keywordResults];
  const unique = Array.from(
    new Map(combined.map(r => [r.id, r])).values()
  );
  
  // Sort by score
  return unique.sort((a, b) => b.score - a.score).slice(0, 10);
}

Advanced Configuration

Configure retrieval behavior:
const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  
  // Embedding options
  providerOptions: {
    openai: {
      dimensions: 1536
    }
  },
  
  // Database-specific config
  databaseConfig: {
    schema: 'public',
    table: 'embeddings'
  },
  
  // Reranker
  reranker: {
    model: new CohereRelevanceScorer({ apiKey: process.env.COHERE_API_KEY }),
    options: {
      topK: 5,
      model: 'rerank-english-v3.0'
    }
  },
  
  // Filtering
  enableFilter: true,
  
  // Output options
  includeSources: true,
  includeVectors: false
});

Performance Optimization

Tune TopK

Start with 10-20 results, then rerank to 3-5. Balance between recall and latency.

Use Reranking

Reranking significantly improves top-K quality for complex queries.

Cache Embeddings

Cache query embeddings for frequently asked questions to reduce API calls.

Filter Early

Use metadata filters to reduce search space and improve performance.

Troubleshooting

  • Verify embeddings were created during ingestion
  • Check index name matches between ingestion and retrieval
  • Ensure embedding model dimensions match vector store
  • Try lowering similarity threshold or increasing topK
  • Add reranking to improve relevance
  • Increase topK before reranking (e.g., retrieve 20, rerank to 5)
  • Review chunking strategy - chunks may be too small or large
  • Add metadata extraction for better context
  • Use vector store with HNSW indexes (PostgreSQL)
  • Reduce topK to retrieve fewer results
  • Add metadata filters to narrow search space
  • Consider caching for common queries
  • Improve tool description to clarify when to use it
  • Add examples in agent instructions
  • Verify tool is registered with agent
  • Check agent model supports function calling

Best Practices

  1. Start with high topK, rerank to low topK: Retrieve 20-30 results, rerank to 3-5 for best quality
  2. Use descriptive tool descriptions: Help agents understand when to use each tool
  3. Add metadata filtering: Enable filtering for large knowledge bases
  4. Monitor and log: Track queries, results, and agent decisions for optimization
  5. Test with real queries: Evaluate retrieval quality with actual user questions

Next Steps

Chunking

Optimize document chunking strategies

Ingestion

Learn about document processing

Memory

Combine RAG with conversation memory

Build docs developers (and LLMs) love