graphrag index

The index command builds a knowledge graph index from your source documents by extracting entities, relationships, and communities.

Usage

graphrag index [OPTIONS]

Options

--root

string

default:"current directory"

The project root directory containing the settings.yaml configuration file.Aliases: -r

--method

string

default:"standard"

The indexing method to use.Aliases: -mAvailable methods:

standard - Traditional GraphRAG indexing with all graph construction and summarization performed by an LLM
fast - Fast indexing using NLP for graph construction and LLM for summarization

--verbose

boolean

default:"false"

Run the indexing pipeline with verbose logging to see detailed progress information.Aliases: -v

--dry-run

boolean

default:"false"

Run the indexing pipeline without executing any steps. Useful for inspecting and validating the configuration before running.

--cache

boolean

default:"true"

Use LLM response caching to avoid redundant API calls and reduce costs.Use --no-cache to disable caching.

--skip-validation

boolean

default:"false"

Skip any preflight validation checks. Useful when running indexing without LLM steps or in specialized configurations.

Examples

Basic indexing

Run indexing with default settings:

graphrag index

Specify project directory

graphrag index --root ./my-project

Use fast indexing method

graphrag index --method fast

Fast indexing uses NLP-based entity extraction instead of LLM-based extraction, which is faster and cheaper but may be less accurate.

Verbose logging

graphrag index --verbose

Dry run to validate configuration

graphrag index --dry-run

This will load and validate your configuration without actually running the indexing pipeline.

Disable caching

graphrag index --no-cache

Skip validation

graphrag index --skip-validation

Output

The indexing pipeline creates several output files in the output/ directory:

entities.parquet - Extracted entities with descriptions and embeddings
relationships.parquet - Relationships between entities
communities.parquet - Detected community structure
community_reports.parquet - Summarized reports for each community
text_units.parquet - Chunked text units with embeddings
covariates.parquet - Extracted claims (if claim extraction is enabled)

Indexing process

The indexing pipeline performs the following steps:

Document chunking - Split documents into manageable text chunks
Entity extraction - Extract entities and relationships using LLM or NLP
Entity resolution - Merge duplicate entities and summarize descriptions
Community detection - Detect hierarchical communities using Leiden algorithm
Community summarization - Generate natural language summaries for each community
Embedding generation - Create vector embeddings for entities and text units

Performance considerations

Standard method: More accurate but slower and more expensive (uses LLM for all extractions)
Fast method: Faster and cheaper but potentially less accurate (uses NLP for entity extraction)
Caching: Keep caching enabled to avoid redundant API calls during re-runs
Concurrent requests: Adjust concurrent_requests in settings.yaml to control API rate limits

Error handling

The indexing command will exit with status code 1 if any errors are encountered during the pipeline. Check the logs for detailed error messages. Common issues:

Missing API key: Ensure GRAPHRAG_API_KEY is set in your .env file
Invalid configuration: Run with --dry-run to validate your configuration
Rate limits: Reduce concurrent_requests in settings.yaml
Out of memory: Reduce chunk_size or process fewer documents

Next steps

After building an index:

Python API

CLI Reference

Data Models

Configuration Schema

Usage

Options

Examples

Basic indexing

Specify project directory

Use fast indexing method

Verbose logging

Dry run to validate configuration

Disable caching

Skip validation

Output

Indexing process

Performance considerations

Error handling

Next steps

Build docs developers (and LLMs) love

Python API

CLI Reference

Data Models

Configuration Schema

​Usage

​Options

​Examples

​Basic indexing

​Specify project directory

​Use fast indexing method

​Verbose logging

​Dry run to validate configuration

​Disable caching

​Skip validation

​Output

​Indexing process

​Performance considerations

​Error handling

​Next steps

Build docs developers (and LLMs) love

Usage

Options

Examples

Basic indexing

Specify project directory

Use fast indexing method

Verbose logging

Dry run to validate configuration

Disable caching

Skip validation

Output

Indexing process

Performance considerations

Error handling

Next steps