Skip to main content
The index command builds a knowledge graph index from your source documents by extracting entities, relationships, and communities.

Usage

graphrag index [OPTIONS]

Options

--root
string
default:"current directory"
The project root directory containing the settings.yaml configuration file.Aliases: -r
--method
string
default:"standard"
The indexing method to use.Aliases: -mAvailable methods:
  • standard - Traditional GraphRAG indexing with all graph construction and summarization performed by an LLM
  • fast - Fast indexing using NLP for graph construction and LLM for summarization
--verbose
boolean
default:"false"
Run the indexing pipeline with verbose logging to see detailed progress information.Aliases: -v
--dry-run
boolean
default:"false"
Run the indexing pipeline without executing any steps. Useful for inspecting and validating the configuration before running.
--cache
boolean
default:"true"
Use LLM response caching to avoid redundant API calls and reduce costs.Use --no-cache to disable caching.
--skip-validation
boolean
default:"false"
Skip any preflight validation checks. Useful when running indexing without LLM steps or in specialized configurations.

Examples

Basic indexing

Run indexing with default settings:
graphrag index

Specify project directory

graphrag index --root ./my-project

Use fast indexing method

graphrag index --method fast
Fast indexing uses NLP-based entity extraction instead of LLM-based extraction, which is faster and cheaper but may be less accurate.

Verbose logging

graphrag index --verbose

Dry run to validate configuration

graphrag index --dry-run
This will load and validate your configuration without actually running the indexing pipeline.

Disable caching

graphrag index --no-cache

Skip validation

graphrag index --skip-validation

Output

The indexing pipeline creates several output files in the output/ directory:
  • entities.parquet - Extracted entities with descriptions and embeddings
  • relationships.parquet - Relationships between entities
  • communities.parquet - Detected community structure
  • community_reports.parquet - Summarized reports for each community
  • text_units.parquet - Chunked text units with embeddings
  • covariates.parquet - Extracted claims (if claim extraction is enabled)

Indexing process

The indexing pipeline performs the following steps:
  1. Document chunking - Split documents into manageable text chunks
  2. Entity extraction - Extract entities and relationships using LLM or NLP
  3. Entity resolution - Merge duplicate entities and summarize descriptions
  4. Community detection - Detect hierarchical communities using Leiden algorithm
  5. Community summarization - Generate natural language summaries for each community
  6. Embedding generation - Create vector embeddings for entities and text units

Performance considerations

  • Standard method: More accurate but slower and more expensive (uses LLM for all extractions)
  • Fast method: Faster and cheaper but potentially less accurate (uses NLP for entity extraction)
  • Caching: Keep caching enabled to avoid redundant API calls during re-runs
  • Concurrent requests: Adjust concurrent_requests in settings.yaml to control API rate limits

Error handling

The indexing command will exit with status code 1 if any errors are encountered during the pipeline. Check the logs for detailed error messages. Common issues:
  • Missing API key: Ensure GRAPHRAG_API_KEY is set in your .env file
  • Invalid configuration: Run with --dry-run to validate your configuration
  • Rate limits: Reduce concurrent_requests in settings.yaml
  • Out of memory: Reduce chunk_size or process fewer documents

Next steps

After building an index:

Build docs developers (and LLMs) love