Core concepts overview

GraphRAG is a structured, hierarchical approach to Retrieval Augmented Generation (RAG) that fundamentally differs from traditional semantic search methods. Instead of relying on plain text snippets, GraphRAG extracts a knowledge graph from raw text, builds a community hierarchy, generates summaries, and leverages these structures for enhanced reasoning.

What is GraphRAG?

GraphRAG is a data pipeline and transformation suite designed to extract meaningful, structured data from unstructured text using the power of LLMs. It addresses critical limitations in baseline RAG systems:

Connect the dots

Baseline RAG struggles when answers require traversing disparate pieces of information through shared attributes. GraphRAG excels at synthesizing insights across connected data.

Holistic understanding

Traditional RAG performs poorly when summarizing semantic concepts over large collections. GraphRAG uses hierarchical community structures for dataset-wide comprehension.

Entity-centric reasoning

GraphRAG builds knowledge graphs where entities and relationships provide structured access to information, enabling more precise retrieval.

Multi-level insights

Community detection creates hierarchical summaries at multiple granularities, from high-level themes to detailed local clusters.

The GraphRAG process

The GraphRAG workflow consists of two major phases: indexing and querying.

Indexing phase

The indexing pipeline transforms raw documents into a structured knowledge model:

Slice documents into text units

Input documents are chunked into analyzable text units (default: 1200 tokens) that serve as the foundation for extraction and provide fine-grained source references.

Extract knowledge graph

LLMs extract entities, relationships, and optional claims from each text unit. Entities represent people, places, organizations, or events. Relationships connect entities with descriptive context.

Detect communities

Hierarchical Leiden algorithm clusters the entity graph into communities at multiple levels, revealing the organizational structure of your data.

Generate summaries

Each community receives an LLM-generated summary from bottom-up, creating hierarchical understanding of the dataset at varying levels of detail.

Create embeddings

Text embeddings are generated for entities, text units, and community reports to enable semantic search during retrieval.

Query phase

At query time, GraphRAG provides multiple search strategies tailored to different question types:

Global search
Local search
DRIFT search
Basic search

Best for: Questions requiring holistic understanding of the entire datasetUses community summaries in a map-reduce fashion to answer questions like “What are the top themes in this data?” or “What are the most significant trends?”Leverages the hierarchical community structure to provide comprehensive, dataset-wide insights.

Key advantages over baseline RAG

Structured knowledge representation

GraphRAG creates an explicit knowledge graph where entities and relationships are first-class objects. This structure enables:

Traversal of multi-hop connections
Understanding of entity importance through graph metrics
Relationship-aware retrieval
Community-based organization

Hierarchical summarization

Community detection and bottom-up summarization provide:

Multi-level understanding from global themes to local details
Pre-computed summaries that reduce query-time LLM costs
Ability to reason about dataset structure
Scalable comprehension of large document collections

Multiple retrieval strategies

Different query types require different approaches:

Global search for comprehensive, dataset-wide questions
Local search for entity-specific inquiries
DRIFT search for balanced exploration
Basic search for simple similarity matching

Provenance and citations

Every extracted fact maintains links to:

Source text units
Original documents
Related entities and relationships
Community memberships

This enables transparent, verifiable results with clear source attribution.

The GraphRAG knowledge model

The indexing process produces a structured knowledge model with these core entity types:

Document: Input files (individual CSV rows or .txt files)
TextUnit: Chunks of text for analysis and source references
Entity: Extracted people, places, events, organizations with types and descriptions
Relationship: Connections between entities with descriptive context
Covariate: Optional time-bound claims and statements about entities
Community: Hierarchical clusters of entities from community detection
Community Report: LLM-generated summaries of each community’s contents

All outputs are stored as Parquet tables by default, with embeddings written to your configured vector store.

When to use GraphRAG

GraphRAG is particularly powerful for:

Complex reasoning tasks that require connecting disparate pieces of information
Large document collections where understanding overall themes and structure matters
Private datasets where LLMs need to reason about previously unseen data
Multi-hop questions that require traversing relationships between entities
Exploratory analysis where you need both high-level summaries and detailed local information

GraphRAG indexing can be expensive in terms of LLM API costs and processing time. Start with a small dataset to understand the process and costs before scaling up.

Next steps

Knowledge graphs

Learn how GraphRAG extracts and structures entity-relationship graphs

Indexing pipeline

Explore the multi-phase indexing workflow in detail

Community detection

Understand hierarchical Leiden clustering and community summarization

Retrieval methods

Compare global, local, DRIFT, and basic search strategies

Get Started

Core Concepts

Indexing

Query Engine

Prompt Tuning

Configuration

Guides

Core concepts overview

What is GraphRAG?

Connect the dots

Holistic understanding

Entity-centric reasoning

Multi-level insights

The GraphRAG process

Indexing phase

Query phase

Key advantages over baseline RAG

The GraphRAG knowledge model

When to use GraphRAG

Next steps

Knowledge graphs

Indexing pipeline

Community detection

Retrieval methods

Build docs developers (and LLMs) love

Get Started

Core Concepts

Indexing

Query Engine

Prompt Tuning

Configuration

Guides

​What is GraphRAG?

Connect the dots

Holistic understanding

Entity-centric reasoning

Multi-level insights

​The GraphRAG process

​Indexing phase

​Query phase

​Key advantages over baseline RAG

​The GraphRAG knowledge model

​When to use GraphRAG

​Next steps

Knowledge graphs

Indexing pipeline

Community detection

Retrieval methods

Build docs developers (and LLMs) love

What is GraphRAG?

The GraphRAG process

Indexing phase

Query phase

Key advantages over baseline RAG

The GraphRAG knowledge model

When to use GraphRAG

Next steps