Skip to main content
GraphRAG uses language models for various tasks including entity extraction, summarization, and query responses. This page covers how to configure and customize your language models.

Model support

GraphRAG uses LiteLLM to support 100+ language models from various providers. This includes:
  • OpenAI (GPT-4, GPT-4 Turbo, GPT-4o, o1)
  • Azure OpenAI
  • Anthropic (Claude)
  • Google (Gemini)
  • AWS Bedrock
  • Local models via Ollama or LiteLLM Proxy
GraphRAG has been most thoroughly tested with OpenAI’s GPT-4 series models. Other models are supported but may require additional prompt tuning.

Basic configuration

OpenAI models

The default configuration uses OpenAI models:
completion_models:
  default_completion_model:
    model_provider: openai
    model: gpt-4.1
    auth_method: api_key
    api_key: ${GRAPHRAG_API_KEY}

embedding_models:
  default_embedding_model:
    model_provider: openai
    model: text-embedding-3-large
    auth_method: api_key
    api_key: ${GRAPHRAG_API_KEY}

Azure OpenAI

For Azure-hosted models:
completion_models:
  default_completion_model:
    model_provider: azure
    model: gpt-4
    auth_method: api_key
    api_key: ${AZURE_OPENAI_API_KEY}
    api_base: https://your-resource.openai.azure.com/
    api_version: "2024-02-15-preview"
    azure_deployment_name: gpt-4-deployment
If your Azure deployment name matches the model name, you can omit azure_deployment_name.

Azure Managed Identity

For production environments using managed identity:
completion_models:
  default_completion_model:
    model_provider: azure
    model: gpt-4
    auth_method: azure_managed_identity
    api_base: https://your-resource.openai.azure.com/
    api_version: "2024-02-15-preview"

Other providers

Configure other providers using LiteLLM’s format:
completion_models:
  gemini_model:
    model_provider: gemini
    model: gemini-2.5-flash-lite
    auth_method: api_key
    api_key: ${GEMINI_API_KEY}

embedding_models:
  gemini_embedding:
    model_provider: gemini
    model: gemini-embedding-001
    auth_method: api_key
    api_key: ${GEMINI_API_KEY}
See LiteLLM’s documentation for provider-specific configuration. The model_provider is the prefix before / and model is the suffix after /.

Model configuration options

Required parameters

model_provider
string
required
The model provider (e.g., openai, azure, anthropic, gemini)
model
string
required
The specific model name (e.g., gpt-4.1, claude-3-5-sonnet-20241022)
auth_method
string
required
Authentication method: api_key or azure_managed_identity

Optional parameters

type
string
default:"litellm"
LLM provider type: litellm or mock (for testing)
api_key
string
API key for authentication (use environment variables)
api_base
string
Base URL for API requests (required for Azure and custom endpoints)
api_version
string
API version (Azure only)
azure_deployment_name
string
Azure deployment name if different from model name
call_args
object
Default arguments sent with every request (e.g., temperature, max_tokens, n)

Call arguments

Set default parameters for all model calls:
completion_models:
  default_completion_model:
    model_provider: openai
    model: gpt-4.1
    auth_method: api_key
    api_key: ${GRAPHRAG_API_KEY}
    call_args:
      temperature: 0.7
      max_tokens: 2000
      top_p: 0.95

Retry configuration

Configure automatic retry behavior for failed requests:
completion_models:
  default_completion_model:
    model_provider: openai
    model: gpt-4.1
    auth_method: api_key
    api_key: ${GRAPHRAG_API_KEY}
    retry:
      type: exponential_backoff
      max_retries: 7
      base_delay: 2.0
      max_delay: 60.0
      jitter: true
retry.type
string
default:"exponential_backoff"
Retry strategy: exponential_backoff or immediate
retry.max_retries
integer
default:"7"
Maximum number of retry attempts
retry.base_delay
float
default:"2.0"
Base delay in seconds for exponential backoff
retry.max_delay
float
Maximum delay between retries (no limit if not specified)
retry.jitter
boolean
default:"true"
Add random jitter to retry delays

Rate limiting

Control request rate to avoid hitting API limits:
completion_models:
  default_completion_model:
    model_provider: openai
    model: gpt-4.1
    auth_method: api_key
    api_key: ${GRAPHRAG_API_KEY}
    rate_limit:
      type: sliding_window
      period_in_seconds: 60
      requests_per_period: 100
      tokens_per_period: 150000
rate_limit.type
string
default:"sliding_window"
Rate limiting strategy (currently only sliding_window supported)
rate_limit.period_in_seconds
integer
default:"60"
Time window for rate limiting in seconds
rate_limit.requests_per_period
integer
Maximum requests per time window
rate_limit.tokens_per_period
integer
Maximum tokens per time window

Metrics configuration

Track model usage and performance:
completion_models:
  default_completion_model:
    model_provider: openai
    model: gpt-4.1
    auth_method: api_key
    api_key: ${GRAPHRAG_API_KEY}
    metrics:
      type: default
      store: memory
      writer: log  # or 'file'
      log_level: 20  # INFO
      base_dir: ./metrics  # for file writer
metrics.writer
string
default:"log"
Where to write metrics: log (console) or file
metrics.base_dir
string
Directory for metrics files (when using file writer)

Multiple model configuration

Define different models for different tasks:
completion_models:
  extraction_model:
    model_provider: openai
    model: gpt-4o
    auth_method: api_key
    api_key: ${GRAPHRAG_API_KEY}
  
  query_model:
    model_provider: openai
    model: o1
    auth_method: api_key
    api_key: ${GRAPHRAG_API_KEY}
  
  budget_model:
    model_provider: openai
    model: gpt-4o-mini
    auth_method: api_key
    api_key: ${GRAPHRAG_API_KEY}

embedding_models:
  default_embedding_model:
    model_provider: openai
    model: text-embedding-3-large
    auth_method: api_key
    api_key: ${GRAPHRAG_API_KEY}

# Reference models in workflows
extract_graph:
  completion_model_id: extraction_model
  prompt: "prompts/extract_graph.txt"
  entity_types: [organization, person, geo, event]
  max_gleanings: 1

summarize_descriptions:
  completion_model_id: budget_model
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

community_reports:
  completion_model_id: extraction_model
  graph_prompt: "prompts/community_report_graph.txt"
  max_length: 2000

global_search:
  completion_model_id: query_model
  map_prompt: "prompts/global_search_map_system_prompt.txt"
  reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
Use different models to optimize cost vs. quality tradeoffs. For example:
  • High-end models (GPT-4o, o1) for critical extraction and queries
  • Mid-tier models (GPT-4o-mini) for summarization
  • Budget models for less critical tasks

Model selection considerations

GraphRAG has been thoroughly tested with:
  • gpt-4 - Original GPT-4 model
  • gpt-4-turbo - Faster GPT-4 with larger context
  • gpt-4o - Optimized multimodal model
  • gpt-4o-mini - Smaller, faster, more affordable

o-series models (reasoning)

The o-series models include built-in reasoning:
  • o1 - Advanced reasoning model
  • o1-mini - Smaller reasoning model
o-series models have different parameters:
  • Use max_completion_tokens instead of max_tokens
  • Reasoning tokens count toward usage but are separate from output
  • Slower and more expensive than standard models
  • May require prompt adjustments (less explicit chain-of-thought)

Structured output requirements

Your chosen model must support structured outputs with JSON schema validation. Most modern models support this, but verify before using custom models.

Using custom models

Via proxy servers

Use Ollama or LiteLLM Proxy to connect unsupported models:
completion_models:
  local_model:
    model_provider: openai  # Use OpenAI-compatible endpoint
    model: llama3
    auth_method: api_key
    api_key: not-needed
    api_base: http://localhost:11434/v1  # Ollama endpoint
Custom models may produce malformed JSON responses. Your proxy may need to:
  • Validate and fix JSON formatting
  • Handle schema validation errors
  • Ensure structured output compliance

Via model protocol (library usage)

For programmatic use, implement the model protocol:
from graphrag_llm.completion import LLMCompletion, register_completion

class MyCustomCompletionModel(LLMCompletion):
    # Implement required methods
    async def complete(self, prompt: str, **kwargs):
        # Your implementation
        pass

# Register your model
register_completion("my-custom-model", MyCustomCompletionModel)
Then reference it in config:
completion_models:
  custom_model:
    type: my-custom-model
    # Additional config passed to your implementation
Custom model implementations are only supported when using GraphRAG as a Python library, not via the CLI.

Best practices

1

Start with defaults

Begin with GPT-4o or GPT-4o-mini for reliable results
2

Configure retry and rate limiting

Set appropriate retry logic and rate limits to handle API issues
3

Use environment variables

Never hardcode API keys - use .env files and ${VAR} substitution
4

Monitor costs

Enable metrics to track token usage and optimize model selection
5

Test before production

Validate model performance on sample data before full indexing

Next steps

Settings reference

Complete configuration options

Storage

Configure storage and caching

Prompt tuning

Optimize prompts for your models

Start indexing

Begin processing documents

Build docs developers (and LLMs) love