Bring your own graph

Several users have asked if they can bring their own existing graph and have it summarized for query with GraphRAG. This page describes a simple method that aligns with the existing GraphRAG workflows.

Overview

To cover the basic use cases for GraphRAG query, you should have two or three tables derived from your data:

Entities table

The list of entities (nodes) in your graph

Relationships table

The list of relationships (edges) in your graph

Text units table (optional)

Source text chunks the graph was extracted from. Required for some query methods

The approach is to run a custom GraphRAG workflow pipeline that assumes text chunking, entity extraction, and relationship extraction have already occurred.

Required tables

Entities

For graph summarization purposes, you need the following fields from the full entities schema:

Field	Type	Required	Description
`id`	str	Yes	Unique identifier for the entity
`title`	str	Yes	Name of the entity
`description`	str	Yes	Textual description of the entity
`text_unit_ids`	str[]	Optional	List of source text chunks (if available)

Example entities.parquet

import pandas as pd
from uuid import uuid4

# Create your entities DataFrame
entities = pd.DataFrame([
    {
        "id": str(uuid4()),
        "title": "Microsoft",
        "description": "A multinational technology corporation",
        "text_unit_ids": ["unit1", "unit2"]
    },
    {
        "id": str(uuid4()),
        "title": "Azure",
        "description": "Cloud computing platform by Microsoft",
        "text_unit_ids": ["unit1", "unit3"]
    }
])

# Write to Parquet
entities.to_parquet("output/entities.parquet")

Relationships

For graph summarization purposes, you need the following fields from the full relationships schema:

Field	Type	Required	Description
`id`	str	Yes	Unique identifier for the relationship
`source`	str	Yes	Name of the source entity
`target`	str	Yes	Name of the target entity
`description`	str	Yes	Description of the relationship
`weight`	float	Yes	Edge weight (important for Leiden communities!)
`text_unit_ids`	str[]	Optional	List of source text chunks (if available)

The weight field is critical because it is used to properly compute Leiden communities. Make sure to provide meaningful weights (e.g., 0.0 to 1.0 based on relationship strength).

Example relationships.parquet

import pandas as pd
from uuid import uuid4

# Create your relationships DataFrame
relationships = pd.DataFrame([
    {
        "id": str(uuid4()),
        "source": "Microsoft",
        "target": "Azure",
        "description": "Microsoft develops and operates Azure",
        "weight": 0.95,
        "text_unit_ids": ["unit1"]
    },
    {
        "id": str(uuid4()),
        "source": "Microsoft",
        "target": "OpenAI",
        "description": "Microsoft has invested in and partnered with OpenAI",
        "weight": 0.85,
        "text_unit_ids": ["unit2"]
    }
])

# Write to Parquet
relationships.to_parquet("output/relationships.parquet")

Text units (optional)

Text units are chunks of your documents sized to fit into the context window of your model. Some search methods use these. See the full text_units schema for all fields.

Example text_units.parquet

import pandas as pd
from uuid import uuid4

# Create your text units DataFrame
text_units = pd.DataFrame([
    {
        "id": "unit1",
        "text": "Microsoft Corporation develops Azure cloud platform...",
        "n_tokens": 1200,
        "document_id": "doc1",
        "entity_ids": ["ent1", "ent2"],
        "relationship_ids": ["rel1"]
    }
])

# Write to Parquet
text_units.to_parquet("output/text_units.parquet")

Workflow configuration

GraphRAG allows you to specify only the specific workflow steps you need. For basic graph summarization and query, configure the following in your settings.yaml:

Global search only
All search types
FastGraphRAG variant

For Global Search (community-based summarization):

settings.yaml

workflows:
  - create_communities
  - create_community_reports

This will:

Run Leiden community detection on your graph
Generate LLM-based community reports

This is the minimal configuration for GraphRAG Global Search.

For Local, DRIFT, and Basic search:

settings.yaml

workflows:
  - create_communities
  - create_community_reports
  - generate_text_embeddings

This adds text embedding generation for:

Entity descriptions
Text unit content
Community report content

This requires that you have text_units available.

If your graph doesn’t have entity/relationship descriptions, use text-based community reports:

settings.yaml

workflows:
  - create_communities
  - create_community_reports_text
  - generate_text_embeddings

This uses FastGraphRAG text-based reports instead of description-based reports.

This requires that your entities and relationships tables have valid links to text_unit_ids.

Setup steps

Here’s how to put it all together:

Prepare your data

Create Parquet files for entities and relationships (and optionally text_units) following the schemas above.

import pandas as pd
from pathlib import Path

# Create output directory
output_dir = Path("output")
output_dir.mkdir(exist_ok=True)

# Save your DataFrames
entities_df.to_parquet(output_dir / "entities.parquet")
relationships_df.to_parquet(output_dir / "relationships.parquet")
# text_units_df.to_parquet(output_dir / "text_units.parquet")  # if available

Configure workflows

Update your settings.yaml to only run the workflows you need:

settings.yaml

workflows:
  - create_communities
  - create_community_reports
  # - generate_text_embeddings  # if needed for local/drift search

storage:
  type: file
  base_dir: "output"  # Where your parquet files are

Run indexing

Run the GraphRAG indexer:

graphrag index --root <your_project_root>

This will:

Skip document loading and graph extraction (already done)
Perform community detection on your existing graph
Generate community reports
(Optionally) generate embeddings

Query your graph

Once indexing completes, you can query using GraphRAG:

graphrag query --root <your_project_root> --method global "What are the main themes in this dataset?"

Complete example

Here’s a complete end-to-end example:

convert_graph.py

import pandas as pd
import networkx as nx
from pathlib import Path
from uuid import uuid4

def convert_networkx_to_graphrag(G: nx.Graph, output_dir: str = "output"):
    """Convert a NetworkX graph to GraphRAG format."""
    output_path = Path(output_dir)
    output_path.mkdir(exist_ok=True)
    
    # Extract entities from nodes
    entities = []
    for node in G.nodes():
        entities.append({
            "id": str(uuid4()),
            "title": str(node),
            "description": G.nodes[node].get("description", f"Entity: {node}"),
            "text_unit_ids": [],  # Empty if no text units available
        })
    
    entities_df = pd.DataFrame(entities)
    entities_df.to_parquet(output_path / "entities.parquet")
    print(f"Wrote {len(entities_df)} entities")
    
    # Extract relationships from edges
    relationships = []
    for source, target in G.edges():
        edge_data = G[source][target]
        relationships.append({
            "id": str(uuid4()),
            "source": str(source),
            "target": str(target),
            "description": edge_data.get("description", f"Relationship between {source} and {target}"),
            "weight": edge_data.get("weight", 1.0),
            "text_unit_ids": [],
        })
    
    relationships_df = pd.DataFrame(relationships)
    relationships_df.to_parquet(output_path / "relationships.parquet")
    print(f"Wrote {len(relationships_df)} relationships")
    
    print(f"\nGraph data written to {output_path}/")
    print("Next steps:")
    print("1. Update settings.yaml with workflows: [create_communities, create_community_reports]")
    print("2. Run: graphrag index --root .")

# Example usage
if __name__ == "__main__":
    # Create a sample graph
    G = nx.karate_club_graph()
    
    # Add descriptions to nodes
    for node in G.nodes():
        G.nodes[node]["description"] = f"Person {node} in the karate club"
    
    # Add weights to edges
    for source, target in G.edges():
        G[source][target]["weight"] = 0.8
        G[source][target]["description"] = f"Person {source} knows person {target}"
    
    # Convert to GraphRAG format
    convert_networkx_to_graphrag(G)

Configuration file

Here’s a complete settings.yaml for bring-your-own-graph scenarios:

settings.yaml

# Minimal configuration for existing graphs

# Only run community detection and reporting
workflows:
  - create_communities
  - create_community_reports
  # Uncomment if you need local/drift search:
  # - generate_text_embeddings

# Storage configuration
storage:
  type: file
  base_dir: "output"

# Community detection settings
cluster_graph:
  max_cluster_size: 10  # Adjust based on your graph size
  use_lcc: true  # Use largest connected component
  seed: 42  # For reproducible results

# LLM settings for community reports
llm:
  api_key: ${OPENAI_API_KEY}
  model: gpt-4-turbo-preview
  max_tokens: 4000

# Embedding settings (if using generate_text_embeddings)
embeddings:
  llm:
    api_key: ${OPENAI_API_KEY}
    model: text-embedding-3-small

Limitations and considerations

Missing descriptions

If your graph doesn’t have entity or relationship descriptions:

Use create_community_reports_text instead of create_community_reports
Ensure you have text_units with valid entity/relationship links
Consider adding synthetic descriptions based on entity names/types

Edge weights

Edge weights are critical for Leiden community detection:

Provide meaningful weights (0.0 to 1.0 recommended)
Higher weight = stronger connection
If unknown, use 1.0 for all edges

Text units

Text units are optional for Global Search but required for:

Local Search
DRIFT Search
Text-based community reports

If you don’t have original source text, you can skip these query methods.

Graph size

For large graphs:

Adjust max_cluster_size in cluster_graph settings
Consider using use_lcc: true to focus on the main component
Community detection may take significant time

Next steps

Outputs

Understand the output table schemas

Querying

Learn how to query your graph

Global search

Use community-based search on your graph

Configuration

Configure community detection parameters

Get Started

Core Concepts

Indexing

Query Engine

Prompt Tuning

Configuration

Guides

Bring your own graph

Overview

Required tables

Entities

Relationships

Text units (optional)

Workflow configuration

Setup steps

Complete example

Configuration file

Limitations and considerations

Next steps

Outputs

Querying

Global search

Configuration

Build docs developers (and LLMs) love

Get Started

Core Concepts

Indexing

Query Engine

Prompt Tuning

Configuration

Guides

​Overview

​Required tables

​Entities

​Relationships

​Text units (optional)

​Workflow configuration

​Setup steps

​Complete example

​Configuration file

​Limitations and considerations

​Next steps

Outputs

Querying

Global search

Configuration

Build docs developers (and LLMs) love

Overview

Required tables

Entities

Relationships

Text units (optional)

Workflow configuration

Setup steps

Complete example

Configuration file

Limitations and considerations

Next steps