Skip to main content
Mastra’s RAG (Retrieval-Augmented Generation) package provides a complete toolkit for building document-based knowledge retrieval systems. It handles document ingestion, chunking, embedding, and semantic search.

Core Components

Mastra RAG consists of four main components:
  1. Document Processing: Load and parse documents from various formats
  2. Chunking: Split documents into semantically meaningful chunks
  3. Embedding & Indexing: Create vector embeddings and store in vector databases
  4. Retrieval: Perform semantic search with optional reranking

Quick Start

Here’s a complete RAG pipeline:
import { MDocument } from '@mastra/rag';
import { PgVector } from '@mastra/vector-pg';
import { createVectorQueryTool } from '@mastra/rag/tools';
import { Agent } from '@mastra/core';

// 1. Process documents
const doc = MDocument.fromText(`
  Mastra is a TypeScript framework for building AI applications.
  It provides tools for agents, workflows, memory, and RAG.
`);

// 2. Chunk the document
const chunks = await doc.chunk({
  strategy: 'recursive',
  maxSize: 500,
  overlap: 50
});

// 3. Create embeddings and store in vector DB
const vectorStore = new PgVector({
  connectionString: process.env.DATABASE_URL
});

const embedder = openai.embedding('text-embedding-3-small');

for (const chunk of chunks) {
  const embedding = await embedder.doEmbed({
    values: [chunk.text]
  });
  
  await vectorStore.upsert({
    indexName: 'docs',
    vectors: embedding.embeddings,
    ids: [chunk.id],
    metadata: [{ text: chunk.text }]
  });
}

// 4. Create a RAG tool for agents
const ragTool = createVectorQueryTool({
  id: 'searchDocs',
  indexName: 'docs',
  vectorStore,
  model: embedder,
  description: 'Search documentation for relevant information'
});

// 5. Use with an agent
const agent = new Agent({
  name: 'DocAgent',
  model: 'openai/gpt-4o',
  tools: { ragTool }
});

const result = await agent.generate(
  'What is Mastra used for?'
);

Document Formats

Mastra supports multiple document formats:
const doc = MDocument.fromText(
  'Your text content here',
  { source: 'docs.txt' }
);

Chunking Strategies

Mastra provides multiple chunking strategies optimized for different content types:

Recursive (Default)

Recursively splits text using hierarchical separators:
const chunks = await doc.chunk({
  strategy: 'recursive',
  maxSize: 1000,
  overlap: 100
});

Markdown

Preserves markdown structure and headers:
const chunks = await doc.chunk({
  strategy: 'markdown',
  maxSize: 1000,
  headers: [
    ['#', 'h1'],
    ['##', 'h2'],
    ['###', 'h3']
  ]
});

Semantic

Groups semantically related content:
const chunks = await doc.chunk({
  strategy: 'semantic-markdown',
  maxSize: 800,
  joinThreshold: 0.5
});

Code-Aware

Handles programming language syntax:
const chunks = await doc.chunk({
  strategy: 'recursive',
  language: 'typescript',
  maxSize: 1000
});
See Chunking Strategies for detailed documentation.

Metadata Extraction

Enrich chunks with AI-generated metadata:
const doc = MDocument.fromText(content);

const chunks = await doc.chunk({
  strategy: 'recursive',
  maxSize: 500,
  extract: {
    title: true,
    summary: { model: 'openai/gpt-4o-mini' },
    keywords: { maxKeywords: 5 },
    questions: { maxQuestions: 3 }
  }
});

// Each chunk now has metadata
chunks[0].metadata.title; // "Introduction to Mastra"
chunks[0].metadata.summary; // "Overview of framework features"
chunks[0].metadata.keywords; // ["typescript", "AI", "framework"]

Vector Query Tool

The createVectorQueryTool function creates a tool that agents can use to search your knowledge base:
import { createVectorQueryTool } from '@mastra/rag/tools';

const ragTool = createVectorQueryTool({
  id: 'searchKnowledgeBase',
  indexName: 'company-docs',
  vectorStore,
  model: embedder,
  description: 'Search company documentation and policies',
  enableFilter: true, // Enable metadata filtering
  reranker: {
    model: 'cohere',
    options: {
      topK: 5,
      apiKey: process.env.COHERE_API_KEY
    }
  }
});
id
string
Unique identifier for the tool
indexName
string
required
Vector store index name to query
vectorStore
MastraVector
required
Vector database instance
model
MastraEmbeddingModel
required
Embedding model for query vectorization
description
string
Tool description for the agent
enableFilter
boolean
Enable metadata filtering in queries
reranker
RerankConfig
Optional reranking configuration for improved relevance

Reranking

Improve retrieval quality by reranking results:
import { rerank } from '@mastra/rag';
import { CohereRelevanceScorer } from '@mastra/rag/relevance';

const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  reranker: {
    model: new CohereRelevanceScorer({
      apiKey: process.env.COHERE_API_KEY
    }),
    options: {
      topK: 5 // Return top 5 after reranking
    }
  }
});
Supported rerankers:
  • Cohere: High-quality commercial reranker
  • MastraAgent: Use LLM-based reranking
  • ZeroEntropy: Open-source alternative

Filtering with Metadata

Add metadata during indexing for filtering:
await vectorStore.upsert({
  indexName: 'docs',
  vectors: embeddings,
  ids: chunkIds,
  metadata: chunks.map(chunk => ({
    text: chunk.text,
    category: 'api',
    version: '1.0',
    language: 'typescript'
  }))
});
Query with filters:
const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  enableFilter: true
});

// Agent can now use filters
const result = await agent.generate(
  'Find TypeScript API docs for version 1.0',
  {
    tools: { ragTool }
  }
);

Integration with Memory

Combine RAG with Mastra’s memory system:
import { Agent, Memory } from '@mastra/core';
import { createVectorQueryTool } from '@mastra/rag/tools';

const memory = new Memory({
  storage,
  vector: vectorStore, // Same vector store
  embedder: 'openai/text-embedding-3-small',
  options: {
    semanticRecall: true // Enable semantic recall
  }
});

const ragTool = createVectorQueryTool({
  indexName: 'knowledge-base',
  vectorStore,
  model: embedder
});

const agent = new Agent({
  name: 'Assistant',
  model: 'openai/gpt-4o',
  memory, // Semantic recall for conversation history
  tools: { ragTool } // RAG for knowledge base
});

RAG Architecture Patterns

Basic RAG

Simple retrieval and generation:
const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder
});

const agent = new Agent({
  model: 'openai/gpt-4o',
  tools: { ragTool }
});

Multi-Index RAG

Search across multiple knowledge bases:
const docsTool = createVectorQueryTool({
  id: 'searchDocs',
  indexName: 'documentation',
  vectorStore,
  model: embedder
});

const apiTool = createVectorQueryTool({
  id: 'searchAPI',
  indexName: 'api-reference',
  vectorStore,
  model: embedder
});

const agent = new Agent({
  model: 'openai/gpt-4o',
  tools: { docsTool, apiTool }
});

Hybrid RAG + Semantic Memory

Combine document retrieval with conversation memory:
const memory = new Memory({
  storage,
  vector: vectorStore,
  embedder,
  options: {
    lastMessages: 10,
    semanticRecall: {
      topK: 5,
      scope: 'resource'
    }
  }
});

const ragTool = createVectorQueryTool({
  indexName: 'knowledge-base',
  vectorStore,
  model: embedder
});

const agent = new Agent({
  model: 'openai/gpt-4o',
  memory, // Conversation context
  tools: { ragTool } // Document knowledge
});

Performance Optimization

Chunk Size

Use 500-1000 characters per chunk for optimal balance between context and precision.

Overlap

Set 10-20% overlap to maintain context continuity across chunk boundaries.

Reranking

Add reranking to improve top-K results quality, especially for complex queries.

Metadata Filtering

Use metadata filters to narrow search scope and improve relevance.

Best Practices

  1. Choose the Right Chunking Strategy: Use markdown chunking for structured docs, semantic for narrative content
  2. Extract Metadata: Enrich chunks with titles, summaries, and keywords for better retrieval
  3. Test Chunk Sizes: Experiment with different sizes (500-1000 chars) for your use case
  4. Use Reranking: Improve top-K results with a reranker, especially for ambiguous queries
  5. Monitor Performance: Track retrieval quality and adjust topK, thresholds, and chunking

Next Steps

Document Ingestion

Learn how to load and process documents

Chunking Strategies

Master document chunking techniques

Retrieval

Implement semantic search and reranking

Memory

Combine RAG with conversation memory

Build docs developers (and LLMs) love