build_index
Build a knowledge graph index from documents.Parameters
The GraphRAG configuration object. Load from a YAML file using
GraphRagConfig.from_file("settings.yaml") or construct programmatically.The indexing method to use. Options include:
IndexingMethod.Standard- Full LLM-based extractionIndexingMethod.NLP- NLP + LLM hybrid approach
Whether this is an incremental update run. Set to
True to update an existing index with new documents rather than rebuilding from scratch.A list of callback objects to receive pipeline lifecycle events. Use this to monitor indexing progress, handle errors, or implement custom logging.
Additional context to pass to the pipeline. This dictionary is accessible in the pipeline state under the
additional_context key and can be used to pass custom data to pipeline workflows.Enable verbose logging output. When
True, detailed logging information will be printed to the console and written to log files.Override the default document loading and parsing. Supply your own pandas DataFrame of documents to index instead of loading from the configured input source.The DataFrame should have columns matching the expected document schema.
Returns
A list of pipeline run results, one for each workflow executed. Each result contains:
workflow- The name of the workflow that was executedresult- The workflow output dataerror- Any error that occurred (None if successful)errors- List of all errors encountered
Example: Basic indexing
Example: Incremental update
Example: Custom document input
Example: Monitoring with callbacks
Output files
Thebuild_index function produces several output files in the configured output directory:
entities.parquet- Extracted entities with descriptions and metadatarelationships.parquet- Relationships between entitiescommunities.parquet- Hierarchical community structurecommunity_reports.parquet- Summary reports for each communitytext_units.parquet- Chunked text units from source documentscovariates.parquet- Extracted claims and covariates (if enabled)
Configuration
TheGraphRagConfig object controls all aspects of indexing:
Error handling
Thebuild_index function returns results even if some workflows fail. Check the error field in each result:
Related
- Query API - Search your indexed data
- Configuration - Configure indexing settings
- Prompt tune API - Generate custom prompts