Migration guide - GraphRAG

GraphRAG follows semantic versioning and provides migration paths for upgrading between versions. This guide helps you navigate breaking changes and upgrade your projects smoothly.

Versioning approach

GraphRAG follows semantic versioning with some specific considerations:

CLI

Conforms to standard semver

API

Conforms to standard semver

settings.yaml

Changes result in minor version bump

Data model

Conforms to standard semver

Internals

May change without semver compliance

Always run graphrag init --root [path] --force between minor version bumps to ensure you have the latest config format. Back up your customizations first.

General upgrade process

Back up your project

cp -r ./my-project ./my-project-backup

Especially important:

settings.yaml
prompts/ directory
.env file

Upgrade GraphRAG

pip install --upgrade graphrag

Check version

pip show graphrag

Update configuration

For minor/major version bumps:

graphrag init --root ./my-project --force

Then restore your customizations from backup.

Run migration (major versions only)

For major version upgrades, run the migration notebook (see version-specific sections below).

Migration to v3

GraphRAG v3 streamlined the core library by removing rarely-used features and simplifying configuration.

Overview

Migration notebook: docs/examples_notebooks/index_migration_to_v3.ipynb Main goals:

Slim down maintenance overhead
Remove out-of-scope features
Simplify configuration model

Data model changes

The primary breaking change affects the text_units table: Before v3:

# text_units had document_ids (plural) - a list
text_unit = {
    "id": "unit1",
    "text": "...",
    "document_ids": ["doc1", "doc2"]  # List
}

After v3:

# text_units has document_id (singular)
text_unit = {
    "id": "unit1",
    "text": "...",
    "document_id": "doc1"  # Single value
}

The migration notebook handles this transformation automatically - you don’t need to re-index.

API changes

Removed multi-search variants: Removed (no longer available):

# These are gone in v3
from graphrag.api import multi_global_search  # ❌
from graphrag.api import multi_local_search   # ❌

Use instead:

# Single search methods remain
from graphrag.api import global_search  # ✓
from graphrag.api import local_search   # ✓

Configuration changes

Model type changes

Before v3 (fnllm-based):

llm:
  type: openai_chat  # ❌ No longer valid

embedding:
  type: azure_openai_embedding  # ❌ No longer valid

After v3 (LiteLLM-based):

llm:
  type: chat  # ✓ Generic type
  model_provider: openai  # Specify provider

embedding:
  type: embedding  # ✓ Generic type
  model_provider: azure  # Specify provider

Rate limiting

Before v3:

llm:
  rate_limiting: auto  # ❌ No longer supported

After v3:

llm:
  requests_per_minute: 60  # ✓ Explicit limits
  tokens_per_minute: 80000
  # Or use null for no limiting
  requests_per_minute: null

Vector store configuration

Before v3:

# Nested dict for multi-search support
vector_store:
  entity_description:
    type: lancedb
    db_uri: ./lancedb
  community_full_content:
    type: lancedb
    db_uri: ./lancedb

outputs:
  # Multi-search output configuration
  entity_description:
    type: parquet

After v3:

# Simplified single root-level object
vector_store:
  type: lancedb
  db_uri: ./lancedb
  
  # Optional custom schema
  index_schema:
    entity_description:
      index_name: entities
    community_full_content:
      index_name: communities

# No outputs block needed

Removed features

The following configuration blocks have been removed:

# ❌ All removed in v3

umap:  # Removed - use Gephi for visualization
  enabled: false

embed_graph:  # Removed - no longer generates x/y positions
  enabled: false

workflows:
  entity_extraction:
    strategy:  # Removed - unused complexity
      type: nltk

input:
  file_filter:  # Removed - essentially unused
    include: ["*.txt"]

chunking:
  group_by_columns:  # Removed - unused grouping feature
    - document_type

Migration steps

Run migration notebook

Navigate to the migration notebook and execute all cells:

jupyter notebook docs/examples_notebooks/index_migration_to_v3.ipynb

This transforms your existing tables to the v3 format.

Update configuration

graphrag init --root ./my-project --force

Restore customizations

Manually copy over your custom settings:

API keys from .env
Model names
Custom prompts
Rate limits based on your quota
Provider-specific settings

Update API calls (if using Python API)

Remove any multi_*_search calls and replace with single search methods.

Test the migration

Run a query to verify everything works:

graphrag query "test query" --root ./my-project --method global

Migration to v2

GraphRAG v2 renamed index tables for clarity.

Overview

Migration notebook: docs/examples_notebooks/index_migration_to_v2.ipynb

Table renames

All tables were renamed to simply describe their contents:

Old Name (v1)	New Name (v2)
`create_final_entities`	`entities`
`create_final_nodes`	`nodes`
`create_final_communities`	`communities`
`create_final_community_reports`	`community_reports`
`create_final_text_units`	`text_units`
`create_final_relationships`	`relationships`
`create_final_documents`	`documents`

Migration steps

Run migration notebook

jupyter notebook docs/examples_notebooks/index_migration_to_v2.ipynb

Update configuration

graphrag init --root ./my-project --force

Verify table names

Check your output directory - tables should have new names:

ls ./my-project/output/*.parquet

Migration to v1

GraphRAG v1 introduced vector stores and streamlined the data model.

Overview

Migration notebook: docs/examples_notebooks/index_migration_to_v1.ipynb

Major changes

Vector store requirement

v1 requires a vector store for embeddings.New configuration:

vector_store:
  type: lancedb
  db_uri: ./lancedb

Default uses local LanceDB. For production, consider Azure AI Search.

Data model updates

ID fields:

Consistent use of id and human_readable_id
Integer IDs stored as ints (not strings)

Field renames:

document.raw_content → document.text
entity.name → entity.title
relationship.rank → relationship.combined_degree

Removed fields:

relationship.source_degree
relationship.target_degree
All embedding columns (now in vector store)

Community IDs:

id now uses proper UUID
community and human_readable_id retain short IDs

New required embeddings

v1 added embeddings for DRIFT search and base RAG:

entity_description embeddings
community_full_content embeddings
text_unit_text embeddings

Deprecated timestamp paths

Before v1:

storage:
  base_dir: "output/${timestamp}/artifacts"  # ❌

reporting:
  base_dir: "output/${timestamp}/reports"  # ❌

After v1:

storage:
  base_dir: "output"  # ✓ Static path

reporting:
  base_dir: "output"  # ✓ Static path

Migration steps

Update configuration

graphrag init --root ./my-project --force

Note the new vector_store configuration block.

Remove timestamp paths

Edit settings.yaml or environment variables:

# In .env
GRAPHRAG_STORAGE_BASE_DIR=output
GRAPHRAG_REPORTING_BASE_DIR=output

Run migration notebook

jupyter notebook docs/examples_notebooks/index_migration_to_v1.ipynb

Re-index with vector store

Run indexing to populate the vector store:

graphrag index --root ./my-project

This leverages your existing cache for LLM calls.

Best practices

Always backup before upgrading

# Complete project backup
tar -czf my-project-backup-$(date +%Y%m%d).tar.gz ./my-project

# Or selective backup
mkdir -p backups
cp settings.yaml backups/settings.yaml.$(date +%Y%m%d)
cp -r prompts backups/prompts.$(date +%Y%m%d)
cp .env backups/.env.$(date +%Y%m%d)

Test on a copy first

cp -r ./my-project ./my-project-test
cd ./my-project-test
# Upgrade and test here first

Use cache to avoid re-indexing costs

GraphRAG’s cache prevents redundant LLM calls:

settings.yaml

cache:
  type: file
  base_dir: ./cache

After migration, re-indexing will use cached LLM responses, saving time and money.

Track your version

Add version info to your project:

settings.yaml

name: "my-project"
# Add version metadata
metadata:
  graphrag_version: "3.0.0"
  last_updated: "2024-03-15"
  migration_notes: "Migrated from v2 to v3"

Read release notes

Before upgrading, review:

GitHub Releases
Breaking Changes
Version-specific migration notebooks

Troubleshooting

Migration notebook fails

Possible causes:

Corrupted parquet files
Missing columns
Incompatible data types

Solutions:

Check notebook output for specific error
Verify parquet files can be read: pd.read_parquet("output/entities.parquet")
Re-index from scratch if data is corrupted

Configuration validation errors

Solution:

Run dry-run to identify issues:

graphrag index --root ./my-project --dry-run --verbose

Compare your config to the latest template
Check for removed or renamed settings

Query fails after migration

Common issues:

Old parquet file names
Missing vector store setup
Incompatible data schema

Solutions:

Run the appropriate migration notebook
Verify vector store configuration
Re-index if needed

Python API import errors

v3 specific:

# This will fail in v3
from graphrag.api import multi_global_search  # ❌

# Use this instead
from graphrag.api import global_search  # ✓

Update all import statements to use single search methods.

Version compatibility matrix

GraphRAG Version	Python Version	Key Features	Data Model Version
3.x	≥3.10	LiteLLM, simplified config	v3
2.x	≥3.10	Renamed tables	v2
1.x	≥3.10	Vector stores, streamlined model	v1
<1.0	≥3.10	Pre-release	v0

Getting help

If you encounter issues during migration:

Check the breaking changes document
Review GitHub Issues
Ask in GitHub Discussions
Consult version-specific migration notebooks

Next steps

Configuration

Learn about all config options

Best practices

Optimize your implementation

CLI usage

Master the command-line interface

Python API

Use GraphRAG programmatically

Get Started

Core Concepts

Indexing

Query Engine

Prompt Tuning

Configuration

Guides

​Versioning approach

CLI

API

settings.yaml

Data model

Internals

​General upgrade process

​Migration to v3

​Overview

​Data model changes

​API changes

​Configuration changes

​Migration steps

​Migration to v2

​Overview

​Table renames

​Migration steps

​Migration to v1

​Overview

​Major changes

​Migration steps

​Best practices

​Always backup before upgrading

​Test on a copy first

​Use cache to avoid re-indexing costs

​Track your version

​Read release notes

​Troubleshooting

​Version compatibility matrix

​Getting help

​Next steps

Configuration

Best practices

CLI usage

Python API

Build docs developers (and LLMs) love

Versioning approach

General upgrade process

Migration to v3

Overview

Data model changes

API changes

Configuration changes

Migration steps

Migration to v2

Overview

Table renames

Migration steps

Migration to v1

Overview

Major changes

Migration steps

Best practices

Always backup before upgrading

Test on a copy first

Use cache to avoid re-indexing costs

Track your version

Read release notes

Troubleshooting

Version compatibility matrix

Getting help

Next steps