Mastra’s RAG (Retrieval-Augmented Generation) package provides a complete toolkit for building document-based knowledge retrieval systems. It handles document ingestion, chunking, embedding, and semantic search.
Core Components
Mastra RAG consists of four main components:
Document Processing : Load and parse documents from various formats
Chunking : Split documents into semantically meaningful chunks
Embedding & Indexing : Create vector embeddings and store in vector databases
Retrieval : Perform semantic search with optional reranking
Quick Start
Here’s a complete RAG pipeline:
import { MDocument } from '@mastra/rag' ;
import { PgVector } from '@mastra/vector-pg' ;
import { createVectorQueryTool } from '@mastra/rag/tools' ;
import { Agent } from '@mastra/core' ;
// 1. Process documents
const doc = MDocument . fromText ( `
Mastra is a TypeScript framework for building AI applications.
It provides tools for agents, workflows, memory, and RAG.
` );
// 2. Chunk the document
const chunks = await doc . chunk ({
strategy: 'recursive' ,
maxSize: 500 ,
overlap: 50
});
// 3. Create embeddings and store in vector DB
const vectorStore = new PgVector ({
connectionString: process . env . DATABASE_URL
});
const embedder = openai . embedding ( 'text-embedding-3-small' );
for ( const chunk of chunks ) {
const embedding = await embedder . doEmbed ({
values: [ chunk . text ]
});
await vectorStore . upsert ({
indexName: 'docs' ,
vectors: embedding . embeddings ,
ids: [ chunk . id ],
metadata: [{ text: chunk . text }]
});
}
// 4. Create a RAG tool for agents
const ragTool = createVectorQueryTool ({
id: 'searchDocs' ,
indexName: 'docs' ,
vectorStore ,
model: embedder ,
description: 'Search documentation for relevant information'
});
// 5. Use with an agent
const agent = new Agent ({
name: 'DocAgent' ,
model: 'openai/gpt-4o' ,
tools: { ragTool }
});
const result = await agent . generate (
'What is Mastra used for?'
);
Mastra supports multiple document formats:
Plain Text
Markdown
HTML
JSON
const doc = MDocument . fromText (
'Your text content here' ,
{ source: 'docs.txt' }
);
Chunking Strategies
Mastra provides multiple chunking strategies optimized for different content types:
Recursive (Default)
Recursively splits text using hierarchical separators:
const chunks = await doc . chunk ({
strategy: 'recursive' ,
maxSize: 1000 ,
overlap: 100
});
Markdown
Preserves markdown structure and headers:
const chunks = await doc . chunk ({
strategy: 'markdown' ,
maxSize: 1000 ,
headers: [
[ '#' , 'h1' ],
[ '##' , 'h2' ],
[ '###' , 'h3' ]
]
});
Semantic
Groups semantically related content:
const chunks = await doc . chunk ({
strategy: 'semantic-markdown' ,
maxSize: 800 ,
joinThreshold: 0.5
});
Code-Aware
Handles programming language syntax:
const chunks = await doc . chunk ({
strategy: 'recursive' ,
language: 'typescript' ,
maxSize: 1000
});
See Chunking Strategies for detailed documentation.
Enrich chunks with AI-generated metadata:
const doc = MDocument . fromText ( content );
const chunks = await doc . chunk ({
strategy: 'recursive' ,
maxSize: 500 ,
extract: {
title: true ,
summary: { model: 'openai/gpt-4o-mini' },
keywords: { maxKeywords: 5 },
questions: { maxQuestions: 3 }
}
});
// Each chunk now has metadata
chunks [ 0 ]. metadata . title ; // "Introduction to Mastra"
chunks [ 0 ]. metadata . summary ; // "Overview of framework features"
chunks [ 0 ]. metadata . keywords ; // ["typescript", "AI", "framework"]
The createVectorQueryTool function creates a tool that agents can use to search your knowledge base:
import { createVectorQueryTool } from '@mastra/rag/tools' ;
const ragTool = createVectorQueryTool ({
id: 'searchKnowledgeBase' ,
indexName: 'company-docs' ,
vectorStore ,
model: embedder ,
description: 'Search company documentation and policies' ,
enableFilter: true , // Enable metadata filtering
reranker: {
model: 'cohere' ,
options: {
topK: 5 ,
apiKey: process . env . COHERE_API_KEY
}
}
});
Unique identifier for the tool
Vector store index name to query
model
MastraEmbeddingModel
required
Embedding model for query vectorization
Tool description for the agent
Enable metadata filtering in queries
Optional reranking configuration for improved relevance
Reranking
Improve retrieval quality by reranking results:
import { rerank } from '@mastra/rag' ;
import { CohereRelevanceScorer } from '@mastra/rag/relevance' ;
const ragTool = createVectorQueryTool ({
indexName: 'docs' ,
vectorStore ,
model: embedder ,
reranker: {
model: new CohereRelevanceScorer ({
apiKey: process . env . COHERE_API_KEY
}),
options: {
topK: 5 // Return top 5 after reranking
}
}
});
Supported rerankers:
Cohere : High-quality commercial reranker
MastraAgent : Use LLM-based reranking
ZeroEntropy : Open-source alternative
Add metadata during indexing for filtering:
await vectorStore . upsert ({
indexName: 'docs' ,
vectors: embeddings ,
ids: chunkIds ,
metadata: chunks . map ( chunk => ({
text: chunk . text ,
category: 'api' ,
version: '1.0' ,
language: 'typescript'
}))
});
Query with filters:
const ragTool = createVectorQueryTool ({
indexName: 'docs' ,
vectorStore ,
model: embedder ,
enableFilter: true
});
// Agent can now use filters
const result = await agent . generate (
'Find TypeScript API docs for version 1.0' ,
{
tools: { ragTool }
}
);
Integration with Memory
Combine RAG with Mastra’s memory system:
import { Agent , Memory } from '@mastra/core' ;
import { createVectorQueryTool } from '@mastra/rag/tools' ;
const memory = new Memory ({
storage ,
vector: vectorStore , // Same vector store
embedder: 'openai/text-embedding-3-small' ,
options: {
semanticRecall: true // Enable semantic recall
}
});
const ragTool = createVectorQueryTool ({
indexName: 'knowledge-base' ,
vectorStore ,
model: embedder
});
const agent = new Agent ({
name: 'Assistant' ,
model: 'openai/gpt-4o' ,
memory , // Semantic recall for conversation history
tools: { ragTool } // RAG for knowledge base
});
RAG Architecture Patterns
Basic RAG
Simple retrieval and generation:
const ragTool = createVectorQueryTool ({
indexName: 'docs' ,
vectorStore ,
model: embedder
});
const agent = new Agent ({
model: 'openai/gpt-4o' ,
tools: { ragTool }
});
Multi-Index RAG
Search across multiple knowledge bases:
const docsTool = createVectorQueryTool ({
id: 'searchDocs' ,
indexName: 'documentation' ,
vectorStore ,
model: embedder
});
const apiTool = createVectorQueryTool ({
id: 'searchAPI' ,
indexName: 'api-reference' ,
vectorStore ,
model: embedder
});
const agent = new Agent ({
model: 'openai/gpt-4o' ,
tools: { docsTool , apiTool }
});
Hybrid RAG + Semantic Memory
Combine document retrieval with conversation memory:
const memory = new Memory ({
storage ,
vector: vectorStore ,
embedder ,
options: {
lastMessages: 10 ,
semanticRecall: {
topK: 5 ,
scope: 'resource'
}
}
});
const ragTool = createVectorQueryTool ({
indexName: 'knowledge-base' ,
vectorStore ,
model: embedder
});
const agent = new Agent ({
model: 'openai/gpt-4o' ,
memory , // Conversation context
tools: { ragTool } // Document knowledge
});
Chunk Size Use 500-1000 characters per chunk for optimal balance between context and precision.
Overlap Set 10-20% overlap to maintain context continuity across chunk boundaries.
Reranking Add reranking to improve top-K results quality, especially for complex queries.
Metadata Filtering Use metadata filters to narrow search scope and improve relevance.
Best Practices
Choose the Right Chunking Strategy : Use markdown chunking for structured docs, semantic for narrative content
Extract Metadata : Enrich chunks with titles, summaries, and keywords for better retrieval
Test Chunk Sizes : Experiment with different sizes (500-1000 chars) for your use case
Use Reranking : Improve top-K results with a reranker, especially for ambiguous queries
Monitor Performance : Track retrieval quality and adjust topK, thresholds, and chunking
Next Steps
Document Ingestion Learn how to load and process documents
Chunking Strategies Master document chunking techniques
Retrieval Implement semantic search and reranking
Memory Combine RAG with conversation memory