graphrag update

The update command incrementally updates an existing knowledge graph index with new documents, preserving existing entities and relationships while adding new ones.

Usage

graphrag update [OPTIONS]

Options

--root

string

default:"current directory"

The project root directory containing the settings.yaml configuration file and existing index.Aliases: -r

--method

string

default:"standard"

The indexing method to use for the update.Aliases: -mAvailable methods:

standard - Traditional GraphRAG indexing with LLM-based extraction
fast - Fast indexing using NLP-based extraction

--verbose

boolean

default:"false"

Run the update pipeline with verbose logging.Aliases: -v

--cache

boolean

default:"true"

Use LLM response caching to avoid redundant API calls.Use --no-cache to disable caching.

--skip-validation

boolean

default:"false"

Skip any preflight validation checks. Useful when running without LLM steps.

Examples

Basic update

Update the index with new documents:

graphrag update

Specify project directory

graphrag update --root ./my-project

Use fast update method

graphrag update --method fast

Verbose logging

graphrag update --verbose

Disable caching

graphrag update --no-cache

Output

By default, update outputs are saved to the update_output/ directory to avoid overwriting your existing index. This allows you to:

Review the updated index before replacing the original
Keep multiple versions of the index
Safely roll back if needed

The update creates the same output files as the index command:

entities.parquet
relationships.parquet
communities.parquet
community_reports.parquet
text_units.parquet
covariates.parquet (if enabled)

How update works

The update process:

Load existing index - Reads the current index from the output/ directory
Process new documents - Identifies and processes new documents in the input/ directory
Merge entities - Integrates new entities with existing ones, resolving duplicates
Update relationships - Adds new relationships and updates existing ones
Recompute communities - Recalculates community structure with the updated graph
Generate reports - Creates or updates community reports
Save to update_output - Writes the updated index to update_output/ directory

Incremental vs. full reindex

Use update when:

You have new documents to add to an existing index
You want to preserve existing extracted entities and relationships
You need faster processing for incremental changes

Use index when:

Building a new index from scratch
You’ve made significant changes to prompts or configuration
You want to completely rebuild the knowledge graph

Configuration

You can customize the update output directory in settings.yaml:

update_output_storage:
  type: file
  base_dir: custom_update_output

Best practices

Backup your index: Before running update, backup your output/ directory
Review updates: Check the update_output/ directory before replacing your production index
Consistent configuration: Use the same settings for update as you used for the initial index
Monitor duplicates: Review merged entities to ensure proper deduplication

Merging updates into production

After reviewing the updated index:

# Backup the current index
cp -r output output_backup

# Replace with updated index
rm -rf output
mv update_output output

Performance considerations

Update is generally faster than full reindex since it only processes new documents
Community detection is recalculated on the full graph, which can take time for large indexes
Caching helps avoid re-processing similar content

Error handling

The update command will exit with status code 1 if errors occur. Common issues:

Missing existing index: Ensure you’ve run graphrag index first
Incompatible schema: Use the same GraphRAG version for index and update
Configuration mismatch: Maintain consistent settings between index and update operations

Python API

CLI Reference

Data Models

Configuration Schema

Usage

Options

Examples

Basic update

Specify project directory

Use fast update method

Verbose logging

Disable caching

Output

How update works

Incremental vs. full reindex

Configuration

Best practices

Merging updates into production

Performance considerations

Error handling

Next steps

Build docs developers (and LLMs) love

Python API

CLI Reference

Data Models

Configuration Schema

​Usage

​Options

​Examples

​Basic update

​Specify project directory

​Use fast update method

​Verbose logging

​Disable caching

​Output

​How update works

​Incremental vs. full reindex

​Configuration

​Best practices

​Merging updates into production

​Performance considerations

​Error handling

​Next steps

Build docs developers (and LLMs) love

Usage

Options

Examples

Basic update

Specify project directory

Use fast update method

Verbose logging

Disable caching

Output

How update works

Incremental vs. full reindex

Configuration

Best practices

Merging updates into production

Performance considerations

Error handling

Next steps