Indexing overview

Learn more about Mintlify

Enter your email to receive updates about new features and product releases.

What is indexing?
Getting started
Usage
Key features
Next steps

The GraphRAG indexing package is a data pipeline and transformation suite that extracts meaningful, structured data from unstructured text using LLMs.

What is indexing?

Indexing pipelines are configurable workflows composed of standard and custom steps, prompt templates, and input/output adapters. The standard pipeline is designed to:

Extract entities, relationships and claims from raw text
Perform community detection on entities
Generate community summaries and reports at multiple levels of granularity
Embed text into a vector space

The outputs of the pipeline are stored as Parquet tables by default, and embeddings are written to your configured vector store.

Getting started

Install requirements

See the requirements section for details on setting up a development environment.

Configure GraphRAG

To configure GraphRAG, see the configuration documentation.

Run the indexing pipeline

After you have a config file, you can run the pipeline using the CLI or Python API.

Usage

uv run poe index --root <data_root>

The Python API is available in graphrag/api/index.py and provides the recommended method to call the indexer directly from Python code.

Key features

LLM caching

Built-in cache layer around LLM interactions for resilience and efficiency

Flexible workflows

Customizable pipeline with standard and custom workflow steps

Parquet outputs

Structured data outputs in Parquet format for efficient storage and querying

Vector embeddings

Automatic text embedding to configured vector stores

Next steps

Architecture

Understand the underlying concepts and execution model

Data flow

Learn how data flows through the indexing pipeline

Methods

Compare Standard and FastGraphRAG indexing methods

Configuration

Configure the indexing engine for your use case

Retrieval methods

Indexing architecture

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Core Concepts

Indexing

Query Engine

Prompt Tuning

Configuration

Guides

Indexing overview

What is indexing?

Getting started

Usage

Key features

LLM caching

Flexible workflows

Parquet outputs

Vector embeddings

Next steps

Architecture

Data flow

Methods

Configuration

Build docs developers (and LLMs) love

Get Started

Core Concepts

Indexing

Query Engine

Prompt Tuning

Configuration

Guides

​What is indexing?

​Getting started

​Usage

​Key features

LLM caching

Flexible workflows

Parquet outputs

Vector embeddings

​Next steps

Architecture

Data flow

Methods

Configuration

Build docs developers (and LLMs) love

What is indexing?

Getting started

Usage

Key features

Next steps