Skip to main content
The GraphRAG Python API provides programmatic access to indexing, querying, and prompt tuning functionality. This API allows you to integrate GraphRAG capabilities directly into your Python applications.

Warning

This API is under development and may undergo changes in future releases. Backwards compatibility is not guaranteed at this time.

Installation

The Python API is included with the GraphRAG package:
pip install graphrag

Core modules

The API is organized into three main modules:

Indexing API

Build knowledge graph indexes from your documents:
from graphrag.api import build_index
The indexing API enables you to create and update knowledge graph indexes programmatically. See the index API reference for details.

Query API

Perform searches over your indexed knowledge graph:
from graphrag.api import (
    global_search,
    local_search,
    drift_search,
    basic_search
)
The query API provides four search methods with both standard and streaming variants. See the query API reference for details.

Prompt tuning API

Generate domain-specific prompts from your data:
from graphrag.api import generate_indexing_prompts, DocSelectionType
The prompt tuning API automatically generates optimized prompts tailored to your specific domain and documents. See the prompt tune API reference for details.

Quick start

Here’s a basic workflow using the Python API:
import asyncio
import pandas as pd
from graphrag.config.models.graph_rag_config import GraphRagConfig
from graphrag.api import build_index, global_search

async def main():
    # Load configuration
    config = GraphRagConfig.from_file("settings.yaml")
    
    # Build the index
    results = await build_index(config=config)
    
    # Load indexed data
    entities = pd.read_parquet("output/entities.parquet")
    communities = pd.read_parquet("output/communities.parquet")
    reports = pd.read_parquet("output/community_reports.parquet")
    
    # Perform a search
    response, context = await global_search(
        config=config,
        entities=entities,
        communities=communities,
        community_reports=reports,
        community_level=2,
        dynamic_community_selection=False,
        response_type="multiple paragraphs",
        query="What are the main themes?"
    )
    
    print(response)

if __name__ == "__main__":
    asyncio.run(main())

Configuration

All API functions require a GraphRagConfig object. You can create this from a YAML file:
from graphrag.config.models.graph_rag_config import GraphRagConfig

config = GraphRagConfig.from_file("settings.yaml")
Or construct it programmatically:
from graphrag.config.models.graph_rag_config import GraphRagConfig

config = GraphRagConfig(
    # Configure your settings here
)

Data formats

The API uses pandas DataFrames for data exchange. Indexed data is stored in Parquet format:
  • entities.parquet - Entity information
  • communities.parquet - Community structure
  • community_reports.parquet - Community summaries
  • text_units.parquet - Text chunks
  • relationships.parquet - Entity relationships
  • covariates.parquet - Claims and covariates (optional)

Error handling

API functions may raise exceptions. Use try-except blocks for proper error handling:
try:
    response, context = await global_search(
        config=config,
        entities=entities,
        communities=communities,
        community_reports=reports,
        community_level=2,
        dynamic_community_selection=False,
        response_type="multiple paragraphs",
        query="What are the main themes?"
    )
except Exception as e:
    print(f"Search failed: {e}")

Next steps

Build docs developers (and LLMs) love