Basic tutorial

This tutorial will walk you through the complete GraphRAG workflow, from installation to querying your indexed data. You’ll learn how to set up GraphRAG, index a sample dataset, and perform both global and local searches.

GraphRAG can consume significant LLM resources. Start with the tutorial dataset and inexpensive models before scaling up to larger datasets.

Prerequisites

Before you begin, ensure you have:

Python 3.10, 3.11, or 3.12 installed
An OpenAI API key or Azure OpenAI credentials
Basic familiarity with command line operations

Installation

Create a project directory

Create a new directory for your GraphRAG project and navigate to it:

mkdir graphrag_tutorial
cd graphrag_tutorial

Set up virtual environment

Create and activate a Python virtual environment:

Unix/MacOS
Windows

python -m venv .venv
source .venv/bin/activate

python -m venv .venv
.venv\Scripts\activate

Install GraphRAG

Install the GraphRAG package using pip:

python -m pip install graphrag

Initialize your workspace

Run initialization

Initialize your GraphRAG workspace:

graphrag init

When prompted, specify your preferred chat and embedding models. This command creates:

.env - Environment variables file
settings.yaml - Pipeline configuration
input/ - Directory for your source documents

Configure API credentials

Open the .env file and add your API key:

GRAPHRAG_API_KEY=sk-your-openai-api-key-here

Configure Azure settings (if applicable)

If using Azure OpenAI, edit settings.yaml to add Azure-specific configuration:

completion_models:
  default_completion_model:
    type: chat
    model_provider: azure
    model: gpt-4.1
    deployment_name: <AZURE_DEPLOYMENT_NAME>
    api_base: https://<instance>.openai.azure.com
    api_version: 2024-02-15-preview

embedding_models:
  default_embedding_model:
    type: embedding
    model_provider: azure
    model: text-embedding-3-small
    deployment_name: <AZURE_EMBEDDING_DEPLOYMENT_NAME>
    api_base: https://<instance>.openai.azure.com
    api_version: 2024-02-15-preview

Add sample data

Download sample text

Download a sample text file to process. We’ll use “A Christmas Carol” by Charles Dickens:

curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o ./input/book.txt

Verify the input

Confirm the file was downloaded successfully:

ls -lh input/

You should see book.txt in the input directory.

Run the indexing pipeline

Start indexing

Execute the indexing pipeline to process your documents:

graphrag index

This process will:

Extract entities and relationships from your text
Build a knowledge graph structure
Generate community summaries
Create embeddings for semantic search

The indexing process typically takes several minutes depending on document size and API rate limits. Progress will be displayed in your terminal.

Review output

After completion, check the ./output directory for generated parquet files:

ls -lh output/

Key output files include:

entities.parquet - Extracted entities
relationships.parquet - Entity relationships
communities.parquet - Detected communities
community_reports.parquet - Community summaries
text_units.parquet - Chunked text segments

Query your data

Now that your data is indexed, you can query it using two different search methods.

Global search

Global search answers high-level questions by analyzing community reports across the entire dataset.

Run a global query

Ask a broad question about the entire dataset:

graphrag query "What are the main themes in this story?"

Global search is ideal for questions like:

“What are the top themes?”
“What is the overall narrative?”
“What are the key events?”

Local search

Local search answers specific questions by combining knowledge graph data with relevant text chunks.

Run a local query

Ask a specific question about entities or details:

graphrag query \
  "Who is Scrooge and what are his main relationships?" \
  --method local

Local search is ideal for questions like:

“Who is [character] and what do they do?”
“What is the relationship between X and Y?”
“What are the properties of [entity]?”

Understanding the results

Response
Context
Token usage

The main output is the AI-generated answer to your query, synthesized from the indexed knowledge graph.

Next steps

Custom prompts

Learn how to customize prompts for better domain-specific results

Azure deployment

Deploy GraphRAG with Azure OpenAI and Azure Storage

Configuration

Explore advanced configuration options

Query engine

Deep dive into search methods and parameters

Troubleshooting

Rate limit errors

If you encounter rate limit errors:

Reduce the number of concurrent requests in settings.yaml
Add rate limiting configuration
Use a higher-tier API plan

Out of memory errors

For large datasets:

Reduce chunk size in the chunking configuration
Process documents in smaller batches
Increase system memory allocation

Empty or poor results

To improve query results:

Run prompt tuning to adapt prompts to your domain
Verify your input data is properly formatted
Adjust community detection parameters

Tutorials

Notebooks

Use Cases

Prerequisites

Installation

Initialize your workspace

Add sample data

Run the indexing pipeline

Query your data

Global search

Local search

Understanding the results

Next steps

Custom prompts

Azure deployment

Configuration

Query engine

Troubleshooting

Build docs developers (and LLMs) love

Tutorials

Notebooks

Use Cases

​Prerequisites

​Installation

​Initialize your workspace

​Add sample data

​Run the indexing pipeline

​Query your data

​Global search

​Local search

​Understanding the results

​Next steps

Custom prompts

Azure deployment

Configuration

Query engine

​Troubleshooting

Build docs developers (and LLMs) love

Prerequisites

Installation

Initialize your workspace

Add sample data

Run the indexing pipeline

Query your data

Global search

Local search

Understanding the results

Next steps

Troubleshooting