index command builds a knowledge graph index from your source documents by extracting entities, relationships, and communities.
Usage
Options
The project root directory containing the
settings.yaml configuration file.Aliases: -rThe indexing method to use.Aliases:
-mAvailable methods:standard- Traditional GraphRAG indexing with all graph construction and summarization performed by an LLMfast- Fast indexing using NLP for graph construction and LLM for summarization
Run the indexing pipeline with verbose logging to see detailed progress information.Aliases:
-vRun the indexing pipeline without executing any steps. Useful for inspecting and validating the configuration before running.
Use LLM response caching to avoid redundant API calls and reduce costs.Use
--no-cache to disable caching.Skip any preflight validation checks. Useful when running indexing without LLM steps or in specialized configurations.
Examples
Basic indexing
Run indexing with default settings:Specify project directory
Use fast indexing method
Verbose logging
Dry run to validate configuration
Disable caching
Skip validation
Output
The indexing pipeline creates several output files in theoutput/ directory:
entities.parquet- Extracted entities with descriptions and embeddingsrelationships.parquet- Relationships between entitiescommunities.parquet- Detected community structurecommunity_reports.parquet- Summarized reports for each communitytext_units.parquet- Chunked text units with embeddingscovariates.parquet- Extracted claims (if claim extraction is enabled)
Indexing process
The indexing pipeline performs the following steps:- Document chunking - Split documents into manageable text chunks
- Entity extraction - Extract entities and relationships using LLM or NLP
- Entity resolution - Merge duplicate entities and summarize descriptions
- Community detection - Detect hierarchical communities using Leiden algorithm
- Community summarization - Generate natural language summaries for each community
- Embedding generation - Create vector embeddings for entities and text units
Performance considerations
- Standard method: More accurate but slower and more expensive (uses LLM for all extractions)
- Fast method: Faster and cheaper but potentially less accurate (uses NLP for entity extraction)
- Caching: Keep caching enabled to avoid redundant API calls during re-runs
- Concurrent requests: Adjust
concurrent_requestsinsettings.yamlto control API rate limits
Error handling
The indexing command will exit with status code 1 if any errors are encountered during the pipeline. Check the logs for detailed error messages. Common issues:- Missing API key: Ensure
GRAPHRAG_API_KEYis set in your.envfile - Invalid configuration: Run with
--dry-runto validate your configuration - Rate limits: Reduce
concurrent_requestsinsettings.yaml - Out of memory: Reduce
chunk_sizeor process fewer documents