RAG Overview - Mastra

Mastra’s RAG (Retrieval-Augmented Generation) package provides a complete toolkit for building document-based knowledge retrieval systems. It handles document ingestion, chunking, embedding, and semantic search.

Core Components

Mastra RAG consists of four main components:

Document Processing: Load and parse documents from various formats
Chunking: Split documents into semantically meaningful chunks
Embedding & Indexing: Create vector embeddings and store in vector databases
Retrieval: Perform semantic search with optional reranking

Quick Start

Here’s a complete RAG pipeline:

import { MDocument } from '@mastra/rag';
import { PgVector } from '@mastra/vector-pg';
import { createVectorQueryTool } from '@mastra/rag/tools';
import { Agent } from '@mastra/core';

// 1. Process documents
const doc = MDocument.fromText(`
  Mastra is a TypeScript framework for building AI applications.
  It provides tools for agents, workflows, memory, and RAG.
`);

// 2. Chunk the document
const chunks = await doc.chunk({
  strategy: 'recursive',
  maxSize: 500,
  overlap: 50
});

// 3. Create embeddings and store in vector DB
const vectorStore = new PgVector({
  connectionString: process.env.DATABASE_URL
});

const embedder = openai.embedding('text-embedding-3-small');

for (const chunk of chunks) {
  const embedding = await embedder.doEmbed({
    values: [chunk.text]
  });
  
  await vectorStore.upsert({
    indexName: 'docs',
    vectors: embedding.embeddings,
    ids: [chunk.id],
    metadata: [{ text: chunk.text }]
  });
}

// 4. Create a RAG tool for agents
const ragTool = createVectorQueryTool({
  id: 'searchDocs',
  indexName: 'docs',
  vectorStore,
  model: embedder,
  description: 'Search documentation for relevant information'
});

// 5. Use with an agent
const agent = new Agent({
  name: 'DocAgent',
  model: 'openai/gpt-4o',
  tools: { ragTool }
});

const result = await agent.generate(
  'What is Mastra used for?'
);

Document Formats

Mastra supports multiple document formats:

const doc = MDocument.fromText(
  'Your text content here',
  { source: 'docs.txt' }
);

Chunking Strategies

Mastra provides multiple chunking strategies optimized for different content types:

Recursive (Default)

Recursively splits text using hierarchical separators:

const chunks = await doc.chunk({
  strategy: 'recursive',
  maxSize: 1000,
  overlap: 100
});

Markdown

Preserves markdown structure and headers:

const chunks = await doc.chunk({
  strategy: 'markdown',
  maxSize: 1000,
  headers: [
    ['#', 'h1'],
    ['##', 'h2'],
    ['###', 'h3']
  ]
});

Semantic

Groups semantically related content:

const chunks = await doc.chunk({
  strategy: 'semantic-markdown',
  maxSize: 800,
  joinThreshold: 0.5
});

Code-Aware

Handles programming language syntax:

const chunks = await doc.chunk({
  strategy: 'recursive',
  language: 'typescript',
  maxSize: 1000
});

See Chunking Strategies for detailed documentation.

Metadata Extraction

Enrich chunks with AI-generated metadata:

const doc = MDocument.fromText(content);

const chunks = await doc.chunk({
  strategy: 'recursive',
  maxSize: 500,
  extract: {
    title: true,
    summary: { model: 'openai/gpt-4o-mini' },
    keywords: { maxKeywords: 5 },
    questions: { maxQuestions: 3 }
  }
});

// Each chunk now has metadata
chunks[0].metadata.title; // "Introduction to Mastra"
chunks[0].metadata.summary; // "Overview of framework features"
chunks[0].metadata.keywords; // ["typescript", "AI", "framework"]

Vector Query Tool

The createVectorQueryTool function creates a tool that agents can use to search your knowledge base:

import { createVectorQueryTool } from '@mastra/rag/tools';

const ragTool = createVectorQueryTool({
  id: 'searchKnowledgeBase',
  indexName: 'company-docs',
  vectorStore,
  model: embedder,
  description: 'Search company documentation and policies',
  enableFilter: true, // Enable metadata filtering
  reranker: {
    model: 'cohere',
    options: {
      topK: 5,
      apiKey: process.env.COHERE_API_KEY
    }
  }
});

string

Unique identifier for the tool

indexName

string

required

Vector store index name to query

vectorStore

MastraVector

required

Vector database instance

model

MastraEmbeddingModel

required

Embedding model for query vectorization

description

string

Tool description for the agent

enableFilter

boolean

Enable metadata filtering in queries

reranker

RerankConfig

Optional reranking configuration for improved relevance

Reranking

Improve retrieval quality by reranking results:

import { rerank } from '@mastra/rag';
import { CohereRelevanceScorer } from '@mastra/rag/relevance';

const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  reranker: {
    model: new CohereRelevanceScorer({
      apiKey: process.env.COHERE_API_KEY
    }),
    options: {
      topK: 5 // Return top 5 after reranking
    }
  }
});

Supported rerankers:

Cohere: High-quality commercial reranker
MastraAgent: Use LLM-based reranking
ZeroEntropy: Open-source alternative

Filtering with Metadata

Add metadata during indexing for filtering:

await vectorStore.upsert({
  indexName: 'docs',
  vectors: embeddings,
  ids: chunkIds,
  metadata: chunks.map(chunk => ({
    text: chunk.text,
    category: 'api',
    version: '1.0',
    language: 'typescript'
  }))
});

Query with filters:

const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder,
  enableFilter: true
});

// Agent can now use filters
const result = await agent.generate(
  'Find TypeScript API docs for version 1.0',
  {
    tools: { ragTool }
  }
);

Integration with Memory

Combine RAG with Mastra’s memory system:

import { Agent, Memory } from '@mastra/core';
import { createVectorQueryTool } from '@mastra/rag/tools';

const memory = new Memory({
  storage,
  vector: vectorStore, // Same vector store
  embedder: 'openai/text-embedding-3-small',
  options: {
    semanticRecall: true // Enable semantic recall
  }
});

const ragTool = createVectorQueryTool({
  indexName: 'knowledge-base',
  vectorStore,
  model: embedder
});

const agent = new Agent({
  name: 'Assistant',
  model: 'openai/gpt-4o',
  memory, // Semantic recall for conversation history
  tools: { ragTool } // RAG for knowledge base
});

RAG Architecture Patterns

Basic RAG

Simple retrieval and generation:

const ragTool = createVectorQueryTool({
  indexName: 'docs',
  vectorStore,
  model: embedder
});

const agent = new Agent({
  model: 'openai/gpt-4o',
  tools: { ragTool }
});

Multi-Index RAG

Search across multiple knowledge bases:

const docsTool = createVectorQueryTool({
  id: 'searchDocs',
  indexName: 'documentation',
  vectorStore,
  model: embedder
});

const apiTool = createVectorQueryTool({
  id: 'searchAPI',
  indexName: 'api-reference',
  vectorStore,
  model: embedder
});

const agent = new Agent({
  model: 'openai/gpt-4o',
  tools: { docsTool, apiTool }
});

Hybrid RAG + Semantic Memory

Combine document retrieval with conversation memory:

const memory = new Memory({
  storage,
  vector: vectorStore,
  embedder,
  options: {
    lastMessages: 10,
    semanticRecall: {
      topK: 5,
      scope: 'resource'
    }
  }
});

const ragTool = createVectorQueryTool({
  indexName: 'knowledge-base',
  vectorStore,
  model: embedder
});

const agent = new Agent({
  model: 'openai/gpt-4o',
  memory, // Conversation context
  tools: { ragTool } // Document knowledge
});

Performance Optimization

Chunk Size

Use 500-1000 characters per chunk for optimal balance between context and precision.

Overlap

Set 10-20% overlap to maintain context continuity across chunk boundaries.

Reranking

Add reranking to improve top-K results quality, especially for complex queries.

Metadata Filtering

Use metadata filters to narrow search scope and improve relevance.

Best Practices

Choose the Right Chunking Strategy: Use markdown chunking for structured docs, semantic for narrative content
Extract Metadata: Enrich chunks with titles, summaries, and keywords for better retrieval
Test Chunk Sizes: Experiment with different sizes (500-1000 chars) for your use case
Use Reranking: Improve top-K results with a reranker, especially for ambiguous queries
Monitor Performance: Track retrieval quality and adjust topK, thresholds, and chunking

Next Steps

Document Ingestion

Learn how to load and process documents

Chunking Strategies

Master document chunking techniques

Retrieval

Implement semantic search and reranking

Memory

Combine RAG with conversation memory

Get Started

Core Concepts

Agents

Workflows

Memory

RAG

Tools & MCP

Storage

Server & API

Observability

Evals

Deployment

​Core Components

​Quick Start

​Document Formats

​Chunking Strategies

​Recursive (Default)

​Markdown

​Semantic

​Code-Aware

​Metadata Extraction

​Vector Query Tool

​Reranking

​Filtering with Metadata

​Integration with Memory

​RAG Architecture Patterns

​Basic RAG

​Multi-Index RAG

​Hybrid RAG + Semantic Memory

​Performance Optimization

Chunk Size

Overlap

Reranking

Metadata Filtering

​Best Practices

​Next Steps

Document Ingestion

Chunking Strategies

Retrieval

Memory

Build docs developers (and LLMs) love

Core Components

Quick Start

Document Formats

Chunking Strategies

Recursive (Default)

Markdown

Semantic

Code-Aware

Metadata Extraction

Vector Query Tool

Reranking

Filtering with Metadata

Integration with Memory

RAG Architecture Patterns

Basic RAG

Multi-Index RAG

Hybrid RAG + Semantic Memory

Performance Optimization

Best Practices

Next Steps