Skip to main content

RAG System Architecture

Local GPT uses a sophisticated Retrieval-Augmented Generation (RAG) system to enhance AI responses with context from your Obsidian vault. This page explains the technical implementation details.

Overview

The RAG system processes linked documents, extracts relevant content, and provides context to the AI model. This is implemented primarily in src/rag.ts with support for both Markdown and PDF files.
Enhanced Actions automatically extract context from documents linked in your selection to provide richer AI responses.

Document Processing Pipeline

The system identifies linked files through the getLinkedFiles() function:
// Supports both wiki-links and markdown links
const WIKI_LINK_REGEX = /\[\[([^\]|#]+)(?:#[^\]|]+)?(?:\|[^\]]+)?\]\]/g;
const MARKDOWN_LINK_REGEX = /\[[^\]]+\]\(([^)]+)\)/g;
Supported file types:
  • Markdown files (.md)
  • PDF documents (.pdf)
The system sanitizes content by removing code blocks, HTML comments, and inline code before extracting links.

2. Graph Traversal

The RAG system traverses your vault’s knowledge graph with a depth limit:
const MAX_DEPTH = 10;
Processing flow:
1

Start from active file

The system begins with the currently active file and extracts all linked documents.
2

Process forward links

Recursively processes documents linked FROM each file up to MAX_DEPTH levels.
3

Process backlinks

Includes documents that link TO each file (reverse references).
4

Prevent duplicates

Each file is processed only once, even if linked multiple times.

3. Content Extraction

Markdown Files

Markdown content is read using Obsidian’s cached read API for optimal performance:
return vault.cachedRead(file);

PDF Files

PDF processing uses pdfjs-dist with intelligent caching:
// Check cache first
const cachedContent = await fileCache.getContent(file.path);
if (cachedContent?.mtime === file.stat.mtime) {
  return cachedContent.content;
}

// Extract if not cached or outdated
const arrayBuffer = await vault.readBinary(file);
const pdfContent = await extractTextFromPDF(arrayBuffer);
PDF extraction maintains layout by detecting line breaks based on text position (transform[5] values). Large PDFs may take longer to process on first use.
PDF processing details (from src/processors/pdf.ts):
  • Uses a Web Worker for background processing
  • Extracts text with line break detection
  • Preserves page structure with double newlines between pages
  • Caches extracted text in IndexedDB for performance

4. Document Structure

Each processed document is stored with metadata:
interface IAIDocument {
  content: string;
  meta: {
    source: string;      // File path
    basename: string;    // File name without extension
    stat: FileStat;      // Creation/modification times
    depth: number;       // Graph traversal depth
    isBacklink: boolean; // Whether this is a reverse reference
  };
}

Vector Search & Embedding

Embedding Process

The system uses your configured embedding provider to create vector representations:
const results = await aiProviders.retrieve({
  query,
  documents,
  embeddingProvider,
  onProgress: (progress) => {
    // Updates progress bar with chunk processing status
  },
  abortController,
});
Configure your embedding provider in Settings → AI Providers → Embedding AI Provider. Popular choices include nomic-embed-text for Ollama or OpenAI’s text-embedding-3-small.

Chunking Strategy

Documents are automatically split into chunks by the AI Providers SDK:
  • Chunks are sized based on the embedding model’s context window
  • Progress tracking reports processedChunks and totalChunks
  • Each chunk is embedded independently for granular matching
The searchDocuments() function:
  1. Embeds your query (selected text)
  2. Embeds all document chunks
  3. Computes similarity scores (cosine similarity)
  4. Returns ranked results

Context Formatting

Result Ranking

Results are organized intelligently:

Group by File

Results from the same file are grouped together under [[filename]].

Sort by Creation Time

File groups are sorted by creation time (newest first).

Rank by Relevance

Within each group, chunks are sorted by similarity score.

Respect Limits

Context is truncated at the configured limit (see Context Limits).

Output Format

The formatted context follows this structure:
[[Document 1]]
Relevant chunk from Document 1 with highest score...

Another relevant chunk from Document 1...

[[Document 2]]
Relevant chunk from Document 2...
The total character count is logged in development mode: "Total length of context"

Performance Optimizations

Caching Strategies

PDF Content Cache (src/indexedDB.ts):
  • Stores extracted text by file path
  • Includes modification time (mtime) for invalidation
  • Prevents re-extraction on every request
Metadata Cache:
  • Uses Obsidian’s MetadataCache for link resolution
  • Leverages resolvedLinks for graph traversal
  • Avoids manual parsing of markdown files

Parallel Processing

await Promise.all(
  linkedFiles.map(async (file) => {
    await processDocumentForRAG(file, context, processedDocs, 0, false);
    updateCompletedSteps?.(1);
  }),
);
All linked files at the same depth level are processed concurrently for maximum speed.

Progress Tracking

The system provides real-time progress updates:
1

Initialize

totalProgressSteps starts at the number of linked files
2

Dynamic Updates

As chunking begins, addTotalProgressSteps() increases the total
3

Completion Tracking

updateCompletedSteps() increments as each chunk is processed
4

Status Bar

Displays percentage in the Obsidian status bar: ”✨ Enhancing 45%“

Technical Reference

Key Functions

Extracts all wiki-links and markdown links from content.Parameters:
  • content: Text to parse for links
  • vault: Obsidian vault instance
  • metadataCache: Metadata cache for link resolution
  • currentFilePath: Current file path for relative resolution
Returns: Array of TFile objects (md and pdf only)
Performs vector similarity search across documents.Parameters:
  • query: Search query (usually selected text)
  • documents: Array of processed documents
  • aiProviders: AI providers service instance
  • embeddingProvider: Configured embedding provider
  • abortController: For cancellation support
  • updateCompletedSteps: Progress callback
  • addTotalProgressSteps: Dynamic total adjustment callback
  • contextLimit: Maximum context length in characters
Returns: Formatted context string

Source Code References

Core RAG Logic

src/rag.ts - Main RAG implementation

PDF Processing

src/processors/pdf.ts - PDF text extraction

File Caching

src/indexedDB.ts - IndexedDB cache layer

Integration

src/main.ts:688-747 - enhanceWithContext() method

Next Steps

Context Limits

Learn how to configure context limits for different models

Troubleshooting

Debug common RAG and embedding issues

Build docs developers (and LLMs) love