Caching

GraphRAG’s caching system stores language model responses to avoid redundant API calls. This significantly improves performance and reduces costs when re-indexing or iterating on your pipeline.

Why caching matters

LLM API calls are typically:

Expensive (per-token pricing)
Slow (network latency + generation time)
Rate-limited (requests per minute)

Caching helps by:

Reducing API costs by 90%+ on re-runs
Speeding up development iterations
Avoiding rate limit issues
Enabling incremental workflow development

During prompt tuning and development, caching can save hours of processing time and hundreds of dollars in API costs.

How caching works

GraphRAG caches LLM responses based on:

Model instance name - Different workflows use different cache partitions
Input prompt - Exact prompt text including all parameters
Model configuration - Model name and generation parameters

When a request is made:

GraphRAG checks the cache for a matching entry
If found, returns the cached response immediately
If not found, calls the LLM and stores the response

Cache types

JSON cache (recommended)

Stores responses in JSON files, persisted across runs:

cache:
  type: json
  storage:
    type: file
    base_dir: "cache"

Cache directory structure:

project/
└── cache/
    ├── extract_graph/        # Entity extraction cache
    │   └── responses.json
    ├── summarize_descriptions/  # Summarization cache
    │   └── responses.json
    ├── community_reporting/  # Community report cache
    │   └── responses.json
    └── extract_claims/      # Claim extraction cache
        └── responses.json

Memory cache

Stores responses in memory, lost when process ends:

cache:
  type: memory

Memory cache is only useful for:

Testing cache behavior
Single-run pipelines where persistence isn’t needed

All cached data is lost when the process terminates.

No cache

Disable caching entirely:

cache:
  type: none

Only disable caching if you need guaranteed fresh responses or are running in a stateless environment.

Cache storage backends

File storage (default)

Best for local development:

cache:
  type: json
  storage:
    type: file
    base_dir: "cache"

Advantages:

Simple and fast
Easy to inspect and debug
No external dependencies
Works offline

Disadvantages:

Not shareable across machines
Requires disk space

Azure Blob Storage

Best for team collaboration:

cache:
  type: json
  storage:
    type: blob
    connection_string: ${AZURE_STORAGE_CONNECTION_STRING}
    container_name: graphrag-cache
    account_url: https://myaccount.blob.core.windows.net/

Advantages:

Share cache across team members
Persist cache in cloud
Automatic backup and versioning
Scale to large datasets

Disadvantages:

Requires Azure subscription
Network latency for cache lookups
Storage costs

Azure Cosmos DB

Best for high-scale production:

cache:
  type: json
  storage:
    type: cosmosdb
    connection_string: ${COSMOS_CONNECTION_STRING}
    container_name: graphrag-cache
    database_name: graphrag
    account_url: https://myaccount.documents.azure.com:443/

Advantages:

Global distribution
High availability
Advanced query capabilities
Automatic indexing

Disadvantages:

Higher costs than Blob Storage
More complex setup
Overkill for most use cases

Cache partitioning

GraphRAG partitions cache by model_instance_name to keep different workflow steps separate:

extract_graph:
  completion_model_id: default_completion_model
  model_instance_name: extract_graph  # Creates extract_graph/ partition
  prompt: "prompts/extract_graph.txt"

summarize_descriptions:
  completion_model_id: default_completion_model
  model_instance_name: summarize_descriptions  # Creates summarize_descriptions/ partition
  prompt: "prompts/summarize_descriptions.txt"

community_reports:
  completion_model_id: default_completion_model
  model_instance_name: community_reporting  # Creates community_reporting/ partition
  graph_prompt: "prompts/community_report_graph.txt"

Each model_instance_name creates a separate cache partition, allowing you to clear or preserve specific workflow caches independently.

Managing the cache

When cache is used

Cache hits occur when:

Re-running indexing with same data and prompts
Testing different downstream workflows
Iterating on post-extraction processing
Resuming interrupted indexing runs

When cache is bypassed

Cache misses occur when:

Input data changes
Prompts are modified
Model configuration changes
New documents are added
model_instance_name changes

Clearing the cache

Clear cache when you need fresh results:

File cache
Blob cache
Cosmos DB cache

rm -rf cache/

Or selectively clear specific workflows:

rm -rf cache/extract_graph/

Use Azure Portal, Azure CLI, or Storage Explorer:

az storage blob delete-batch \
  --source graphrag-cache \
  --account-name myaccount

Use Azure Portal or delete the container:

az cosmosdb sql container delete \
  --account-name myaccount \
  --database-name graphrag \
  --name graphrag-cache

Clearing cache means re-running all LLM calls, which can be expensive and time-consuming. Only clear when necessary.

Cache optimization strategies

Strategy 1: Persistent cache for development

Use file-based JSON cache during development:

cache:
  type: json
  storage:
    type: file
    base_dir: "cache"

Benefits:

Fast local access
Survive process restarts
Easy to inspect and debug

Strategy 2: Shared cache for teams

Use Azure Blob Storage for team collaboration:

cache:
  type: json
  storage:
    type: blob
    connection_string: ${AZURE_STORAGE_CONNECTION_STRING}
    container_name: team-graphrag-cache
    account_url: https://teamstorage.blob.core.windows.net/

Benefits:

Share cache across team members
Reduce duplicate API calls
Save collective API costs

Strategy 3: Separate caches per experiment

Use different cache directories for different experiments:

cache:
  type: json
  storage:
    type: file
    base_dir: "cache/experiment-1"

cache:
  type: json
  storage:
    type: file
    base_dir: "cache/experiment-2"

Benefits:

Compare different approaches
Preserve baseline results
Easy rollback to previous configs

Strategy 4: Disable cache for production

Disable caching in production if you need guaranteed fresh results:

cache:
  type: none

Benefits:

No stale data
Predictable behavior
No cache management overhead

Cost considerations

Cache effectiveness varies by workflow:

Entity extraction (extract_graph)

High cache value - Most expensive operation

Processes every text chunk through LLM
Multiple gleaning passes increase costs
Cache saves 80-95% of extraction costs on re-runs

Description summarization (summarize_descriptions)

Medium cache value - Moderate costs

Summarizes entity descriptions
Fewer calls than extraction
Cache saves 70-90% on re-runs

Community reports (community_reports)

Medium cache value - Report generation costs

Generates one report per community
Can be expensive for large graphs
Cache saves 80-95% on re-runs

Claim extraction (extract_claims)

High cache value - If enabled

Similar to entity extraction
Disabled by default
Cache essential when tuning claim prompts

Example: Development workflow

Optimal caching strategy for development:

Initial run with cache

cache:
  type: json
  storage:
    type: file
    base_dir: "cache"

Run full indexing - cache fills with LLM responses

Iterate on prompts

Modify prompts in prompts/ directoryClear specific cache partition:

rm -rf cache/extract_graph/

Re-run indexing - only re-extracts, reuses downstream cache

Tune downstream workflows

Modify chunking, clustering, or other settingsKeep extraction cache intact:

# No cache clearing needed

Re-run indexing - reuses expensive extraction cache

Final production run

Optionally disable cache for clean run:

cache:
  type: none

Or keep cache for faster production updates

Troubleshooting

Cache not being used

Prompt changes

Even small prompt modifications invalidate cache. Ensure prompts are identical for cache hits.

Model configuration changes

Changing model, temperature, or other parameters bypasses cache.

Input data changes

Modified source documents generate different prompts, missing cache.

Cache path issues

Verify cache directory exists and has write permissions.

Cache growing too large

Clear old experiments

Remove cache directories for completed experiments:

rm -rf cache/old-experiment/

Use selective caching

Only cache expensive operations:

# Cache extraction but not summarization
extract_graph:
  model_instance_name: extract_graph  # cached

summarize_descriptions:
  model_instance_name: null  # not cached

Compress old cache

Archive and compress inactive cache:

tar -czf cache-backup-2024-01.tar.gz cache/
rm -rf cache/

Best practices

Always enable caching during development

Use JSON file cache to speed up iterations

Back up cache before major changes

Copy cache directory before clearing or modifying prompts

Use blob cache for team projects

Share cache across team to reduce collective API costs

Clear cache selectively

Only clear partitions that need refresh, preserve others

Monitor cache effectiveness

Check logs to verify cache hit rates

Next steps

Storage

Learn about other storage configurations

Settings reference

Complete configuration options

LLM models

Configure language models

Prompt tuning

Optimize prompts with caching

Get Started

Core Concepts

Indexing

Query Engine

Prompt Tuning

Configuration

Guides

Why caching matters

How caching works

Cache types

JSON cache (recommended)

Memory cache

No cache

Cache storage backends

File storage (default)

Azure Blob Storage

Azure Cosmos DB

Cache partitioning

Managing the cache

When cache is used

When cache is bypassed

Clearing the cache

Cache optimization strategies

Strategy 1: Persistent cache for development

Strategy 2: Shared cache for teams

Strategy 3: Separate caches per experiment

Strategy 4: Disable cache for production

Cost considerations

Example: Development workflow

Troubleshooting

Cache not being used

Cache growing too large

Best practices

Next steps

Storage

Settings reference

LLM models

Prompt tuning

Build docs developers (and LLMs) love

Get Started

Core Concepts

Indexing

Query Engine

Prompt Tuning

Configuration

Guides

​Why caching matters

​How caching works

​Cache types

​JSON cache (recommended)

​Memory cache

​No cache

​Cache storage backends

​File storage (default)

​Azure Blob Storage

​Azure Cosmos DB

​Cache partitioning

​Managing the cache

​When cache is used

​When cache is bypassed

​Clearing the cache

​Cache optimization strategies

​Strategy 1: Persistent cache for development

​Strategy 2: Shared cache for teams

​Strategy 3: Separate caches per experiment

​Strategy 4: Disable cache for production

​Cost considerations

​Example: Development workflow

​Troubleshooting

​Cache not being used

​Cache growing too large

​Best practices

​Next steps

Storage

Settings reference

LLM models

Prompt tuning

Build docs developers (and LLMs) love

Why caching matters

How caching works

Cache types

JSON cache (recommended)

Memory cache

No cache

Cache storage backends

File storage (default)

Azure Blob Storage

Azure Cosmos DB

Cache partitioning

Managing the cache

When cache is used

When cache is bypassed

Clearing the cache

Cache optimization strategies

Strategy 1: Persistent cache for development

Strategy 2: Shared cache for teams

Strategy 3: Separate caches per experiment

Strategy 4: Disable cache for production

Cost considerations

Example: Development workflow

Troubleshooting

Cache not being used

Cache growing too large

Best practices

Next steps