Semantic Retrieval

Semantic retrieval uses vector embeddings to find relevant document chunks based on meaning rather than keywords. Mastra provides tools for vector search, reranking, and filtering.

Vector Query Tool

The createVectorQueryTool function creates a tool that agents can use to search your knowledge base:

import { createVectorQueryTool } from '@mastra/rag/tools';
import { PgVector } from '@mastra/vector-pg';
import { openai } from '@ai-sdk/openai';

const vectorStore = new PgVector({
  connectionString: process.env.DATABASE_URL
});

const embedder = openai.embedding('text-embedding-3-small');

const ragTool = createVectorQueryTool({
  id: 'searchDocs',
  indexName: 'documentation',
  vectorStore,
  model: embedder,
  description: 'Search documentation for relevant information',
  includeSources: true,
  includeVectors: false
});

Configuration Options

string

Unique identifier for the tool. Defaults to “VectorQuery Tool”

indexName

string

required

Name of the vector index to query

vectorStore

MastraVector

required

Vector database instance (PgVector, Pinecone, Qdrant, etc.)

model

MastraEmbeddingModel

required

Embedding model for converting queries to vectors

description

string

Tool description shown to the agent. Should explain what knowledge base is being searched.

enableFilter

boolean

default:"false"

Enable metadata filtering. When true, agents can filter results by metadata fields.

includeSources

boolean

default:"true"

Include full source objects in results. Set to false to return only metadata.

includeVectors

boolean

default:"false"

Include vector embeddings in results. Usually not needed.

reranker

RerankConfig

Optional reranking configuration to improve result relevance

databaseConfig

DatabaseConfig

Database-specific configuration for vector queries

providerOptions

ProviderOptions

Provider-specific options for embedding generation

Using with Agents

Add the RAG tool to an agent:

import { Agent } from '@mastra/core';
import { createVectorQueryTool } from '@mastra/rag/tools';

const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  description: 'Search product documentation'
});

const agent = new Agent({
  name: 'DocAgent',
  model: 'openai/gpt-4o',
  instructions: 'You are a helpful assistant that answers questions about our product. Use the searchDocs tool to find relevant information.',
  tools: { searchDocs: ragTool }
});

const result = await agent.generate(
  'How do I configure authentication?'
);

The agent will automatically:

Determine when to search the knowledge base
Generate a semantic search query
Retrieve relevant chunks
Use the context to answer the question

Reranking Results

Reranking improves result quality by scoring retrieved chunks with a specialized model:

Cohere Reranker

import { CohereRelevanceScorer } from '@mastra/rag/relevance';

const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  reranker: {
    model: new CohereRelevanceScorer({
      apiKey: process.env.COHERE_API_KEY
    }),
    options: {
      topK: 5 // Return top 5 after reranking
    }
  }
});

Mastra Agent Reranker

Use an LLM to rerank results:

import { MastraAgentRelevanceScorer } from '@mastra/rag/relevance';
import { openai } from '@ai-sdk/openai';

const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  reranker: {
    model: new MastraAgentRelevanceScorer({
      model: openai('gpt-4o-mini')
    }),
    options: {
      topK: 3
    }
  }
});

ZeroEntropy Reranker

Open-source reranking:

import { ZeroEntropyRelevanceScorer } from '@mastra/rag/relevance';

const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  reranker: {
    model: new ZeroEntropyRelevanceScorer({
      apiKey: process.env.ZEROENTROPY_API_KEY
    }),
    options: {
      topK: 5
    }
  }
});

Metadata Filtering

Filter results by metadata during retrieval:

// Enable filtering
const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  enableFilter: true,
  description: 'Search documentation. You can filter by category, version, and language.'
});

// Agent can now filter
const result = await agent.generate(
  'Find API documentation for version 2.0',
  { tools: { searchDocs: ragTool } }
);

The tool automatically parses filter metadata from the agent’s query.

Manual Filtering

Pass filters programmatically:

import { vectorQuerySearch } from '@mastra/rag/utils';

const { results } = await vectorQuerySearch({
  indexName: 'docs',
  vectorStore,
  queryText: 'authentication setup',
  model: embedder,
  topK: 10,
  queryFilter: {
    category: 'api',
    version: '2.0',
    language: 'typescript'
  }
});

Direct Vector Search

Perform vector search without using tools:

import { vectorQuerySearch } from '@mastra/rag/utils';

const { results } = await vectorQuerySearch({
  indexName: 'docs',
  vectorStore,
  queryText: 'How to configure database connections?',
  model: embedder,
  topK: 5,
  includeVectors: false
});

results.forEach(result => {
  console.log(`Score: ${result.score}`);
  console.log(`Content: ${result.metadata.text}`);
  console.log(`Source: ${result.metadata.source}`);
});

Result Structure

Vector search returns structured results:

type VectorQueryResult = {
  relevantContext: Array<Record<string, any>>;
  sources: Array<{
    id: string;
    score: number;
    metadata: Record<string, any>;
    vector?: number[];
  }>;
};

Reranking Implementation

Rerank results manually:

import { rerank, rerankWithScorer } from '@mastra/rag';
import { CohereRelevanceScorer } from '@mastra/rag/relevance';

// Get initial results
const { results } = await vectorQuerySearch({
  indexName: 'docs',
  vectorStore,
  queryText: 'authentication',
  model: embedder,
  topK: 20 // Get more for reranking
});

// Rerank with Cohere
const scorer = new CohereRelevanceScorer({
  apiKey: process.env.COHERE_API_KEY
});

const reranked = await rerankWithScorer({
  results,
  query: 'authentication',
  scorer,
  options: {
    topK: 5 // Return top 5
  }
});

reranked.forEach(item => {
  console.log(`Score: ${item.score}`);
  console.log(`Content: ${item.result.metadata.text}`);
});

Multi-Index Search

Search across multiple indexes:

const docsTool = createVectorQueryTool({
  id: 'searchDocs',
  indexName: 'documentation',
  vectorStore,
  model: embedder,
  description: 'Search general documentation'
});

const apiTool = createVectorQueryTool({
  id: 'searchAPI',
  indexName: 'api-reference',
  vectorStore,
  model: embedder,
  description: 'Search API reference and endpoint documentation'
});

const codeTool = createVectorQueryTool({
  id: 'searchCode',
  indexName: 'code-examples',
  vectorStore,
  model: embedder,
  description: 'Search code examples and snippets'
});

const agent = new Agent({
  name: 'Assistant',
  model: 'openai/gpt-4o',
  tools: { searchDocs: docsTool, searchAPI: apiTool, searchCode: codeTool }
});

Hybrid Search

Combine vector search with keyword search:

import { vectorQuerySearch } from '@mastra/rag/utils';

async function hybridSearch(query: string) {
  // Vector search
  const vectorResults = await vectorQuerySearch({
    indexName: 'docs',
    vectorStore,
    queryText: query,
    model: embedder,
    topK: 10
  });
  
  // Keyword search (implementation depends on your storage)
  const keywordResults = await keywordSearch(query);
  
  // Merge and deduplicate
  const combined = [...vectorResults.results, ...keywordResults];
  const unique = Array.from(
    new Map(combined.map(r => [r.id, r])).values()
  );
  
  // Sort by score
  return unique.sort((a, b) => b.score - a.score).slice(0, 10);
}

Advanced Configuration

Configure retrieval behavior:

const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  
  // Embedding options
  providerOptions: {
    openai: {
      dimensions: 1536
    }
  },
  
  // Database-specific config
  databaseConfig: {
    schema: 'public',
    table: 'embeddings'
  },
  
  // Reranker
  reranker: {
    model: new CohereRelevanceScorer({ apiKey: process.env.COHERE_API_KEY }),
    options: {
      topK: 5,
      model: 'rerank-english-v3.0'
    }
  },
  
  // Filtering
  enableFilter: true,
  
  // Output options
  includeSources: true,
  includeVectors: false
});

Performance Optimization

Tune TopK

Start with 10-20 results, then rerank to 3-5. Balance between recall and latency.

Use Reranking

Reranking significantly improves top-K quality for complex queries.

Cache Embeddings

Cache query embeddings for frequently asked questions to reduce API calls.

Filter Early

Use metadata filters to reduce search space and improve performance.

Troubleshooting

No results returned

Verify embeddings were created during ingestion
Check index name matches between ingestion and retrieval
Ensure embedding model dimensions match vector store
Try lowering similarity threshold or increasing topK

Poor result quality

Add reranking to improve relevance
Increase topK before reranking (e.g., retrieve 20, rerank to 5)
Review chunking strategy - chunks may be too small or large
Add metadata extraction for better context

Slow queries

Use vector store with HNSW indexes (PostgreSQL)
Reduce topK to retrieve fewer results
Add metadata filters to narrow search space
Consider caching for common queries

Agent not using tool

Improve tool description to clarify when to use it
Add examples in agent instructions
Verify tool is registered with agent
Check agent model supports function calling

Best Practices

Start with high topK, rerank to low topK: Retrieve 20-30 results, rerank to 3-5 for best quality
Use descriptive tool descriptions: Help agents understand when to use each tool
Add metadata filtering: Enable filtering for large knowledge bases
Monitor and log: Track queries, results, and agent decisions for optimization
Test with real queries: Evaluate retrieval quality with actual user questions

Next Steps

Chunking

Optimize document chunking strategies

Ingestion

Learn about document processing

Memory

Combine RAG with conversation memory

Get Started

Core Concepts

Agents

Workflows

Memory

RAG

Tools & MCP

Storage

Server & API

Observability

Evals

Deployment

Semantic Retrieval

Vector Query Tool

Configuration Options

Using with Agents

Reranking Results

Cohere Reranker

Mastra Agent Reranker

ZeroEntropy Reranker

Metadata Filtering

Manual Filtering

Direct Vector Search

Result Structure

Reranking Implementation

Multi-Index Search

Hybrid Search

Advanced Configuration

Performance Optimization

Tune TopK

Use Reranking

Cache Embeddings

Filter Early

Troubleshooting

Best Practices

Next Steps

Chunking

Ingestion

Memory

Build docs developers (and LLMs) love

Get Started

Core Concepts

Agents

Workflows

Memory

RAG

Tools & MCP

Storage

Server & API

Observability

Evals

Deployment

​Vector Query Tool

​Configuration Options

​Using with Agents

​Reranking Results

​Cohere Reranker

​Mastra Agent Reranker

​ZeroEntropy Reranker

​Metadata Filtering

​Manual Filtering

​Direct Vector Search

​Result Structure

​Reranking Implementation

​Multi-Index Search

​Hybrid Search

​Advanced Configuration

​Performance Optimization

Tune TopK

Use Reranking

Cache Embeddings

Filter Early

​Troubleshooting

​Best Practices

​Next Steps

Chunking

Ingestion

Memory

Build docs developers (and LLMs) love

Vector Query Tool

Configuration Options

Using with Agents

Reranking Results

Cohere Reranker

Mastra Agent Reranker

ZeroEntropy Reranker

Metadata Filtering

Manual Filtering

Direct Vector Search

Result Structure

Reranking Implementation

Multi-Index Search

Hybrid Search

Advanced Configuration

Performance Optimization

Troubleshooting

Best Practices

Next Steps