Skip to main content

What is GraphRAG?

GraphRAG is a data pipeline and transformation suite designed to extract meaningful, structured data from unstructured text using the power of Large Language Models (LLMs). Built by Microsoft Research, it represents a structured, hierarchical approach to Retrieval Augmented Generation (RAG), as opposed to naive semantic-search approaches using plain text snippets.
GraphRAG is an open-source project from Microsoft Research. Read the research paper and blog post to learn more about the methodology.

How GraphRAG works

The GraphRAG process involves four key steps:
1

Extract knowledge graph

Extract entities, relationships, and key claims from your raw text documents to build a comprehensive knowledge graph.
2

Build community hierarchy

Use the Leiden algorithm to perform hierarchical clustering, organizing entities into meaningful semantic communities.
3

Generate summaries

Create bottom-up summaries for each community and its constituents to enable holistic understanding of your dataset.
4

Query with context

Leverage the structured graph and summaries to provide rich context for LLM queries, enabling sophisticated reasoning.

GraphRAG vs baseline RAG

While traditional vector-based RAG (baseline RAG) uses semantic similarity search over text chunks, it struggles in two key scenarios:

Connecting the dots

Baseline RAG fails when answers require traversing disparate pieces of information through their shared attributes to synthesize new insights.

Holistic understanding

Baseline RAG performs poorly when asked to understand summarized semantic concepts over large data collections or singular large documents.
GraphRAG addresses these limitations by creating a knowledge graph with community summaries and graph machine learning outputs, demonstrating substantial improvements for complex reasoning tasks.

Key capabilities

Global search

Reason about holistic questions by leveraging community summaries across your entire dataset.

Local search

Answer specific questions about entities by exploring their neighbors and associated concepts.

DRIFT search

Enhanced local search that includes community information for broader, more comprehensive answers.

Prompt tuning

Fine-tune extraction prompts to optimize GraphRAG performance for your specific data.

The GraphRAG indexing pipeline

GraphRAG’s indexing engine transforms your raw documents through a series of workflows:
The pipeline includes LLM caching to ensure resilience against network issues and provide efficient, idempotent operations.

Query modes

At query time, GraphRAG provides multiple search modes optimized for different question types:
  • Global Search - Best for questions requiring understanding of the entire dataset (e.g., “What are the top themes?”)
  • Local Search - Best for questions about specific entities (e.g., “What are the healing properties of chamomile?”)
  • DRIFT Search - Enhanced local search with community context for comprehensive entity-based queries
  • Basic Search - Standard vector RAG for comparison and baseline queries

Use cases

GraphRAG excels at reasoning about private datasets - data that the LLM has never seen before:
  • Enterprise research documents
  • Business intelligence and reporting
  • Legal document analysis
  • Scientific literature review
  • Customer feedback analysis
  • Knowledge base construction
GraphRAG indexing can be expensive in terms of LLM API calls. Start with small datasets to understand costs, and consider using faster/cheaper models during experimentation.

Getting started

Quickstart

Get up and running with GraphRAG in minutes using our command-line quickstart guide.

Installation

Detailed installation instructions and system requirements for all supported platforms.

Configuration

Learn how to configure GraphRAG for your specific data and use case.

Architecture highlights

GraphRAG is built with extensibility and customization in mind:
  • Factory pattern - Register custom implementations for models, storage, vector stores, and workflows
  • Provider support - Built-in support for OpenAI, Azure OpenAI, and 100+ models via LiteLLM
  • Storage flexibility - File, blob storage, and CosmosDB support out of the box
  • Vector store options - LanceDB, Azure AI Search, and CosmosDB with extensible interface

Community and support

GitHub Discussions

Join the conversation and get help from the community.

GitHub Issues

Report bugs and request features.
GraphRAG is provided as demonstration code and is not an officially supported Microsoft product. Always review the Responsible AI FAQ before deployment.

Build docs developers (and LLMs) love