Skip to main content
The GraphRAG indexing package is a data pipeline and transformation suite that extracts meaningful, structured data from unstructured text using LLMs.

What is indexing?

Indexing pipelines are configurable workflows composed of standard and custom steps, prompt templates, and input/output adapters. The standard pipeline is designed to:
  • Extract entities, relationships and claims from raw text
  • Perform community detection on entities
  • Generate community summaries and reports at multiple levels of granularity
  • Embed text into a vector space
The outputs of the pipeline are stored as Parquet tables by default, and embeddings are written to your configured vector store.

Getting started

1

Install requirements

See the requirements section for details on setting up a development environment.
2

Configure GraphRAG

To configure GraphRAG, see the configuration documentation.
3

Run the indexing pipeline

After you have a config file, you can run the pipeline using the CLI or Python API.

Usage

uv run poe index --root <data_root>
The Python API is available in graphrag/api/index.py and provides the recommended method to call the indexer directly from Python code.

Key features

LLM caching

Built-in cache layer around LLM interactions for resilience and efficiency

Flexible workflows

Customizable pipeline with standard and custom workflow steps

Parquet outputs

Structured data outputs in Parquet format for efficient storage and querying

Vector embeddings

Automatic text embedding to configured vector stores

Next steps

Architecture

Understand the underlying concepts and execution model

Data flow

Learn how data flows through the indexing pipeline

Methods

Compare Standard and FastGraphRAG indexing methods

Configuration

Configure the indexing engine for your use case

Build docs developers (and LLMs) love