What is indexing?
Indexing pipelines are configurable workflows composed of standard and custom steps, prompt templates, and input/output adapters. The standard pipeline is designed to:- Extract entities, relationships and claims from raw text
- Perform community detection on entities
- Generate community summaries and reports at multiple levels of granularity
- Embed text into a vector space
Getting started
Install requirements
See the requirements section for details on setting up a development environment.
Configure GraphRAG
To configure GraphRAG, see the configuration documentation.
Usage
The Python API is available in
graphrag/api/index.py and provides the recommended method to call the indexer directly from Python code.Key features
LLM caching
Built-in cache layer around LLM interactions for resilience and efficiency
Flexible workflows
Customizable pipeline with standard and custom workflow steps
Parquet outputs
Structured data outputs in Parquet format for efficient storage and querying
Vector embeddings
Automatic text embedding to configured vector stores
Next steps
Architecture
Understand the underlying concepts and execution model
Data flow
Learn how data flows through the indexing pipeline
Methods
Compare Standard and FastGraphRAG indexing methods
Configuration
Configure the indexing engine for your use case