Best practices - GraphRAG

This guide covers proven strategies for getting the best results from GraphRAG, including configuration optimization, cost management, and workflow recommendations.

Before you start

GraphRAG indexing can be expensive. Always start small, test thoroughly, and understand costs before scaling to production datasets.

Initial testing strategy

Start with a small dataset

Begin with 5-10 representative documents:

# Copy a small sample to test
mkdir ./test-project/input
cp ~/documents/sample*.txt ./test-project/input/

Use affordable models for testing

During development, use cost-effective models:

settings.yaml

llm:
  model: gpt-3.5-turbo  # or gpt-4o-mini

embedding:
  model: text-embedding-3-small

Enable caching

Always enable caching to avoid redundant API calls:

settings.yaml

cache:
  type: file
  base_dir: ./cache

Run dry-run first

Validate configuration before indexing:

graphrag index --root ./test-project --dry-run --verbose

Prompt tuning

Always run prompt tuning before indexing your full dataset. Generic prompts rarely yield optimal results for domain-specific data.

When to tune prompts

New domain: Medical, legal, scientific, business data
Specialized terminology: Industry-specific jargon or concepts
Non-English content: Different language or mixed languages
Specific entity types: You know what entities matter for your use case

Tuning workflow

Prepare representative data

Select documents that represent your full dataset:

# Random sampling
ls ~/all-documents/*.txt | shuf -n 20 | xargs -I {} cp {} ./project/input/

Run prompt tuning

graphrag prompt-tune \
  --root ./project \
  --domain "medical research" \
  --selection-method auto \
  --n-subset-max 300 \
  --language "English"

For large datasets, use --selection-method auto with k-means clustering.

Review generated prompts

Check ./project/prompts/ for:

Entity types discovered
Example extractions
Domain-specific language

Customize if needed

Edit prompts to:

Add missing entity types
Adjust extraction instructions
Improve examples

Test on sample data

Run indexing on a small sample to validate prompt quality:

graphrag index --root ./project --verbose

Prompt tuning parameters

Selection methods

Random (default):

Fast and simple
Good for uniform datasets
Use with --limit 15-20

Top:

Uses first N documents
Good when documents are pre-sorted
Use with --limit 15-20

Auto (recommended for large datasets):

Uses k-means clustering
Selects representative documents
Use with --n-subset-max 300 and --k 15

Domain specification

Be specific but not overly narrow:✓ Good:

“medical research papers”
“corporate financial reports”
“legal contracts and agreements”

✗ Too broad:

“science”
“business”

✗ Too narrow:

“phase 3 clinical trials for oncology drugs”

Language settings

Specify the primary language of your content:

graphrag prompt-tune --language "Spanish"
graphrag prompt-tune --language "French"
graphrag prompt-tune --language "Japanese"

For multilingual datasets, choose the dominant language.

Configuration optimization

Model selection

Choose models based on your requirements:

Development
Production
High-end

Goal: Fast iteration, low cost

llm:
  model: gpt-4o-mini
  temperature: 0.0

embedding:
  model: text-embedding-3-small

Cost: ~$0.05-0.15 per 1000 documents

Goal: High quality extractions

llm:
  model: gpt-4o
  temperature: 0.0

embedding:
  model: text-embedding-3-small

Cost: ~$0.50-1.50 per 1000 documents

Goal: Best possible quality

llm:
  model: gpt-4o
  temperature: 0.0

embedding:
  model: text-embedding-3-large

Cost: ~$0.75-2.00 per 1000 documents

Chunking configuration

Optimize chunking for your document structure:

settings.yaml

chunking:
  size: 300        # Tokens per chunk
  overlap: 100     # Overlap between chunks
  encoding_model: cl100k_base

Guidelines:

Document Type	Chunk Size	Overlap	Rationale
Short articles	200	50	Preserve complete thoughts
Long reports	300-400	100	Balance context and granularity
Technical docs	400-500	100-150	Keep technical concepts together
Transcripts	300	100	Natural conversation flow
Legal documents	500	150	Maintain clause integrity

Larger chunks mean fewer LLM calls (lower cost) but may reduce extraction granularity. Start with 300 and adjust based on results.

Entity extraction settings

settings.yaml

entity_extraction:
  max_gleanings: 1  # Additional extraction passes
  
  # Optional: specify entity types
  entity_types:
    - PERSON
    - ORGANIZATION
    - LOCATION
    - EVENT
    - TECHNOLOGY

max_gleanings trade-offs:

0: Fastest, cheapest, lower recall
1: Recommended balance (default)
2+: Highest quality, expensive, diminishing returns

Each gleaning pass doubles the cost of entity extraction. Only increase for critical use cases.

Community detection

settings.yaml

community_reports:
  max_report_length: 1500  # Tokens per community report

Guidelines:

500-1000: Brief summaries, lower cost
1500: Recommended default, balanced detail
2000-3000: Comprehensive reports, higher cost

Rate limiting

Set appropriate rate limits to avoid throttling:

settings.yaml

llm:
  requests_per_minute: 60
  tokens_per_minute: 80000

embedding:
  requests_per_minute: 60
  tokens_per_minute: 150000

OpenAI
Azure OpenAI

Free tier:

3 RPM, 40,000 TPM (GPT-4)
5 RPM, 100,000 TPM (GPT-3.5)

Tier 1 ($5+ spent):

500 RPM, 80,000 TPM (GPT-4o)
3,500 RPM, 200,000 TPM (GPT-3.5)

Set to 90% of your limit to be safe.

Cost management

Estimate costs before indexing

Run a test with a small sample and extrapolate:

# Index 10 documents
graphrag index --root ./test --verbose

# Check logs for token usage
grep "tokens" ./test/output/logs/app.log

# Extrapolate: (tokens_used / 10) * total_documents * model_price

Cost reduction strategies

Enable caching

Prevents redundant LLM calls during re-indexing

cache:
  type: file
  base_dir: ./cache

Larger chunks

Fewer chunks = fewer LLM calls

chunking:
  size: 400

Reduce gleanings

Each pass costs more

entity_extraction:
  max_gleanings: 0

Use cheaper models

For development and testing

llm:
  model: gpt-4o-mini

Cost tracking

Monitor spending:

OpenAI: Check usage at platform.openai.com/usage
Azure: Monitor costs in Azure Portal → Cost Management
Local logs: Track token counts in GraphRAG logs

Query optimization

Choose the right search method

Global Search
Local Search
DRIFT Search
Basic Search

Best for:

Dataset-wide questions
Theme identification
Summarization
Trend analysis

Example queries:

“What are the main themes?”
“Summarize the key findings”
“What trends appear across documents?”

graphrag query "What are the main themes?" --method global

Best for:

Specific entity queries
Relationship exploration
Detailed information retrieval

Example queries:

“Tell me about John Smith”
“What did the CEO say about revenue?”
“How are these two companies related?”

graphrag query "Tell me about John Smith" --method local

Best for:

Complex queries needing both breadth and depth
Multi-hop reasoning
Comprehensive answers

Example queries:

“How do industry trends affect Company X?”
“What role did Person Y play in Event Z?”

graphrag query "How do trends affect Company X?" --method drift

Best for:

Simple keyword lookup
Quick searches
Testing

Example queries:

“Find mentions of AI”
“Where is machine learning discussed?”

graphrag query "Find mentions of AI" --method basic

Community level selection

graphrag query "your question" --community-level 2

Guidelines:

Level 0: Entire dataset (very broad, expensive)
Level 1: Major themes (broad summaries)
Level 2: Recommended default (balanced granularity)
Level 3+: Fine-grained details (more specific)

Start with level 2. Increase for more specific queries, decrease for very broad questions.

Response type optimization

Guide the format of responses:

# Concise answers
graphrag query "question" --response-type "Single Sentence"

# Structured output
graphrag query "question" --response-type "List of 3-7 Points"

# Detailed responses
graphrag query "question" --response-type "Multiple Paragraphs"

# Custom format
graphrag query "question" --response-type "Executive summary with key metrics"

Data preparation

Document formatting

Supported formats

GraphRAG supports:

Plain text (.txt)
Markdown (.md)
CSV (.csv)
Other formats via custom loaders

Recommendation: Convert documents to plain text or markdown for best results.

Document structure

Well-structured documents yield better results:✓ Good structure:

# Document Title

## Section 1

Content with clear paragraphs...

## Section 2

More structured content...

✗ Poor structure:

No headings or sections
Mixed formatting
Excessive special characters
Malformed text from PDF extraction

Metadata inclusion

Include relevant metadata in documents:

---
title: Research Paper Title
author: John Smith
date: 2024-01-15
category: Medical Research
---

# Main content...

GraphRAG can extract entities from metadata.

Data cleaning

Clean your data before indexing:

import re

def clean_document(text: str) -> str:
    # Remove excessive whitespace
    text = re.sub(r'\s+', ' ', text)
    
    # Remove page numbers
    text = re.sub(r'Page \d+', '', text)
    
    # Fix common OCR errors
    text = text.replace('l1', 'll')  # Example
    
    # Remove headers/footers
    # ... custom logic
    
    return text.strip()

Storage and scalability

Local vs. cloud storage

Local (File)
Cloud (Azure Blob)

Best for:

Development
Small datasets (<10K documents)
Testing

storage:
  type: file
  base_dir: ./output

vector_store:
  type: lancedb
  db_uri: ./lancedb

Best for:

Production
Large datasets
Team collaboration

storage:
  type: blob
  container_name: graphrag-output
  connection_string: ${AZURE_STORAGE_CONNECTION_STRING}

vector_store:
  type: azure_ai_search
  url: ${AZURE_SEARCH_URL}
  api_key: ${AZURE_SEARCH_API_KEY}

Large dataset handling

For datasets with >10,000 documents:

Partition your data

Split into logical groups:

input/
  medical/
  legal/
  financial/

Index separately or together based on use case.

Optimize chunking

Use larger chunks to reduce total chunk count:

chunking:
  size: 400
  overlap: 100

Use cloud storage

Azure Blob + Azure AI Search for scalability.

Implement incremental updates

Use graphrag update for new documents:

graphrag update --root ./project

Workflow best practices

Development workflow

Initial setup

graphrag init --root ./project
# Configure settings.yaml and .env

Small sample test

# 5-10 documents
graphrag index --root ./project --verbose

Prompt tuning

graphrag prompt-tune --root ./project --domain "your domain"

Validation

# Test queries
graphrag query "test question" --method global
graphrag query "test question" --method local

Iterate

Adjust configuration
Refine prompts
Test again

Scale up

# Full dataset
graphrag index --root ./project --verbose

Version control

Track your GraphRAG configuration:

# .gitignore
.env
cache/
output/
lancedb/
*.parquet
*.log

# Commit these
settings.yaml
prompts/
input/  # Or use DVC for large files

Monitoring and debugging

Enable verbose logging during development:

graphrag index --root ./project --verbose

Check logs for:

Token usage
API errors
Extraction quality
Processing time

tail -f ./project/output/logs/app.log

Common pitfalls

Not running prompt tuning

Problem: Generic prompts produce poor extractionsSolution: Always run graphrag prompt-tune for domain-specific data

Skipping small-scale testing

Problem: Expensive mistakes on full datasetSolution: Test with 5-10 documents first, validate results, then scale

Ignoring rate limits

Problem: API throttling, failed indexingSolution: Configure rate limits to 90% of your quota

Disabling cache

Problem: Redundant API calls cost moneySolution: Always enable caching for development

Using wrong search method

Problem: Poor query resultsSolution: Match search method to query type (see query optimization)

Poor document quality

Problem: Garbage in, garbage outSolution: Clean and structure documents before indexing

Performance benchmarks

Typical indexing performance:

Documents	Model	Chunks	Time	Cost
100	GPT-4o	~2,000	15-30 min	$1-3
1,000	GPT-4o	~20,000	2-4 hours	$10-30
10,000	GPT-4o	~200,000	20-40 hours	$100-300

These are rough estimates. Actual costs depend on document length, chunk size, gleanings, and model pricing.

Next steps

CLI usage

Master the command-line interface

Configuration

Deep dive into settings

Prompt tuning

Optimize prompts for your domain

Migration guide

Upgrade between versions

Get Started

Core Concepts

Indexing

Query Engine

Prompt Tuning

Configuration

Guides

​Before you start

​Initial testing strategy

​Prompt tuning

​When to tune prompts

​Tuning workflow

​Prompt tuning parameters

​Configuration optimization

​Model selection

​Chunking configuration

​Entity extraction settings

​Community detection

​Rate limiting

​Cost management

​Estimate costs before indexing

​Cost reduction strategies

Enable caching

Larger chunks

Reduce gleanings

Use cheaper models

​Cost tracking

​Query optimization

​Choose the right search method

​Community level selection

​Response type optimization

​Data preparation

​Document formatting

​Data cleaning

​Storage and scalability

​Local vs. cloud storage

​Large dataset handling

​Workflow best practices

​Development workflow

​Version control

​Monitoring and debugging

​Common pitfalls

​Performance benchmarks

​Next steps

CLI usage

Configuration

Prompt tuning

Migration guide

Build docs developers (and LLMs) love

Before you start

Initial testing strategy

Prompt tuning

When to tune prompts

Tuning workflow

Prompt tuning parameters

Configuration optimization

Model selection

Chunking configuration

Entity extraction settings

Community detection

Rate limiting

Cost management

Estimate costs before indexing

Cost reduction strategies

Cost tracking

Query optimization

Choose the right search method

Community level selection

Response type optimization

Data preparation

Document formatting

Data cleaning

Storage and scalability

Local vs. cloud storage

Large dataset handling

Workflow best practices

Development workflow

Version control

Monitoring and debugging

Common pitfalls

Performance benchmarks

Next steps