Skip to main content
GraphRAG’s caching system stores language model responses to avoid redundant API calls. This significantly improves performance and reduces costs when re-indexing or iterating on your pipeline.

Why caching matters

LLM API calls are typically:
  • Expensive (per-token pricing)
  • Slow (network latency + generation time)
  • Rate-limited (requests per minute)
Caching helps by:
  • Reducing API costs by 90%+ on re-runs
  • Speeding up development iterations
  • Avoiding rate limit issues
  • Enabling incremental workflow development
During prompt tuning and development, caching can save hours of processing time and hundreds of dollars in API costs.

How caching works

GraphRAG caches LLM responses based on:
  1. Model instance name - Different workflows use different cache partitions
  2. Input prompt - Exact prompt text including all parameters
  3. Model configuration - Model name and generation parameters
When a request is made:
  1. GraphRAG checks the cache for a matching entry
  2. If found, returns the cached response immediately
  3. If not found, calls the LLM and stores the response

Cache types

Stores responses in JSON files, persisted across runs:
cache:
  type: json
  storage:
    type: file
    base_dir: "cache"
Cache directory structure:
project/
└── cache/
    ├── extract_graph/        # Entity extraction cache
    │   └── responses.json
    ├── summarize_descriptions/  # Summarization cache
    │   └── responses.json
    ├── community_reporting/  # Community report cache
    │   └── responses.json
    └── extract_claims/      # Claim extraction cache
        └── responses.json

Memory cache

Stores responses in memory, lost when process ends:
cache:
  type: memory
Memory cache is only useful for:
  • Testing cache behavior
  • Single-run pipelines where persistence isn’t needed
All cached data is lost when the process terminates.

No cache

Disable caching entirely:
cache:
  type: none
Only disable caching if you need guaranteed fresh responses or are running in a stateless environment.

Cache storage backends

File storage (default)

Best for local development:
cache:
  type: json
  storage:
    type: file
    base_dir: "cache"
Advantages:
  • Simple and fast
  • Easy to inspect and debug
  • No external dependencies
  • Works offline
Disadvantages:
  • Not shareable across machines
  • Requires disk space

Azure Blob Storage

Best for team collaboration:
cache:
  type: json
  storage:
    type: blob
    connection_string: ${AZURE_STORAGE_CONNECTION_STRING}
    container_name: graphrag-cache
    account_url: https://myaccount.blob.core.windows.net/
Advantages:
  • Share cache across team members
  • Persist cache in cloud
  • Automatic backup and versioning
  • Scale to large datasets
Disadvantages:
  • Requires Azure subscription
  • Network latency for cache lookups
  • Storage costs

Azure Cosmos DB

Best for high-scale production:
cache:
  type: json
  storage:
    type: cosmosdb
    connection_string: ${COSMOS_CONNECTION_STRING}
    container_name: graphrag-cache
    database_name: graphrag
    account_url: https://myaccount.documents.azure.com:443/
Advantages:
  • Global distribution
  • High availability
  • Advanced query capabilities
  • Automatic indexing
Disadvantages:
  • Higher costs than Blob Storage
  • More complex setup
  • Overkill for most use cases

Cache partitioning

GraphRAG partitions cache by model_instance_name to keep different workflow steps separate:
extract_graph:
  completion_model_id: default_completion_model
  model_instance_name: extract_graph  # Creates extract_graph/ partition
  prompt: "prompts/extract_graph.txt"

summarize_descriptions:
  completion_model_id: default_completion_model
  model_instance_name: summarize_descriptions  # Creates summarize_descriptions/ partition
  prompt: "prompts/summarize_descriptions.txt"

community_reports:
  completion_model_id: default_completion_model
  model_instance_name: community_reporting  # Creates community_reporting/ partition
  graph_prompt: "prompts/community_report_graph.txt"
Each model_instance_name creates a separate cache partition, allowing you to clear or preserve specific workflow caches independently.

Managing the cache

When cache is used

Cache hits occur when:
  • Re-running indexing with same data and prompts
  • Testing different downstream workflows
  • Iterating on post-extraction processing
  • Resuming interrupted indexing runs

When cache is bypassed

Cache misses occur when:
  • Input data changes
  • Prompts are modified
  • Model configuration changes
  • New documents are added
  • model_instance_name changes

Clearing the cache

Clear cache when you need fresh results:
rm -rf cache/
Or selectively clear specific workflows:
rm -rf cache/extract_graph/
Clearing cache means re-running all LLM calls, which can be expensive and time-consuming. Only clear when necessary.

Cache optimization strategies

Strategy 1: Persistent cache for development

Use file-based JSON cache during development:
cache:
  type: json
  storage:
    type: file
    base_dir: "cache"
Benefits:
  • Fast local access
  • Survive process restarts
  • Easy to inspect and debug

Strategy 2: Shared cache for teams

Use Azure Blob Storage for team collaboration:
cache:
  type: json
  storage:
    type: blob
    connection_string: ${AZURE_STORAGE_CONNECTION_STRING}
    container_name: team-graphrag-cache
    account_url: https://teamstorage.blob.core.windows.net/
Benefits:
  • Share cache across team members
  • Reduce duplicate API calls
  • Save collective API costs

Strategy 3: Separate caches per experiment

Use different cache directories for different experiments:
cache:
  type: json
  storage:
    type: file
    base_dir: "cache/experiment-1"
cache:
  type: json
  storage:
    type: file
    base_dir: "cache/experiment-2"
Benefits:
  • Compare different approaches
  • Preserve baseline results
  • Easy rollback to previous configs

Strategy 4: Disable cache for production

Disable caching in production if you need guaranteed fresh results:
cache:
  type: none
Benefits:
  • No stale data
  • Predictable behavior
  • No cache management overhead

Cost considerations

Cache effectiveness varies by workflow:
High cache value - Most expensive operation
  • Processes every text chunk through LLM
  • Multiple gleaning passes increase costs
  • Cache saves 80-95% of extraction costs on re-runs
Medium cache value - Moderate costs
  • Summarizes entity descriptions
  • Fewer calls than extraction
  • Cache saves 70-90% on re-runs
Medium cache value - Report generation costs
  • Generates one report per community
  • Can be expensive for large graphs
  • Cache saves 80-95% on re-runs
High cache value - If enabled
  • Similar to entity extraction
  • Disabled by default
  • Cache essential when tuning claim prompts

Example: Development workflow

Optimal caching strategy for development:
1

Initial run with cache

cache:
  type: json
  storage:
    type: file
    base_dir: "cache"
Run full indexing - cache fills with LLM responses
2

Iterate on prompts

Modify prompts in prompts/ directoryClear specific cache partition:
rm -rf cache/extract_graph/
Re-run indexing - only re-extracts, reuses downstream cache
3

Tune downstream workflows

Modify chunking, clustering, or other settingsKeep extraction cache intact:
# No cache clearing needed
Re-run indexing - reuses expensive extraction cache
4

Final production run

Optionally disable cache for clean run:
cache:
  type: none
Or keep cache for faster production updates

Troubleshooting

Cache not being used

Even small prompt modifications invalidate cache. Ensure prompts are identical for cache hits.
Changing model, temperature, or other parameters bypasses cache.
Modified source documents generate different prompts, missing cache.
Verify cache directory exists and has write permissions.

Cache growing too large

Remove cache directories for completed experiments:
rm -rf cache/old-experiment/
Only cache expensive operations:
# Cache extraction but not summarization
extract_graph:
  model_instance_name: extract_graph  # cached

summarize_descriptions:
  model_instance_name: null  # not cached
Archive and compress inactive cache:
tar -czf cache-backup-2024-01.tar.gz cache/
rm -rf cache/

Best practices

1

Always enable caching during development

Use JSON file cache to speed up iterations
2

Back up cache before major changes

Copy cache directory before clearing or modifying prompts
3

Use blob cache for team projects

Share cache across team to reduce collective API costs
4

Clear cache selectively

Only clear partitions that need refresh, preserve others
5

Monitor cache effectiveness

Check logs to verify cache hit rates

Next steps

Storage

Learn about other storage configurations

Settings reference

Complete configuration options

LLM models

Configure language models

Prompt tuning

Optimize prompts with caching

Build docs developers (and LLMs) love