GraphRAG can consume significant LLM resources. Start with the tutorial dataset and inexpensive models before scaling up to larger datasets.
Prerequisites
Before you begin, ensure you have:- Python 3.10, 3.11, or 3.12 installed
- An OpenAI API key or Azure OpenAI credentials
- Basic familiarity with command line operations
Installation
Initialize your workspace
Run initialization
Initialize your GraphRAG workspace:When prompted, specify your preferred chat and embedding models. This command creates:
.env- Environment variables filesettings.yaml- Pipeline configurationinput/- Directory for your source documents
Add sample data
Download sample text
Download a sample text file to process. We’ll use “A Christmas Carol” by Charles Dickens:
Run the indexing pipeline
Start indexing
Execute the indexing pipeline to process your documents:This process will:
- Extract entities and relationships from your text
- Build a knowledge graph structure
- Generate community summaries
- Create embeddings for semantic search
The indexing process typically takes several minutes depending on document size and API rate limits. Progress will be displayed in your terminal.
Review output
After completion, check the Key output files include:
./output directory for generated parquet files:entities.parquet- Extracted entitiesrelationships.parquet- Entity relationshipscommunities.parquet- Detected communitiescommunity_reports.parquet- Community summariestext_units.parquet- Chunked text segments
Query your data
Now that your data is indexed, you can query it using two different search methods.Global search
Global search answers high-level questions by analyzing community reports across the entire dataset.Local search
Local search answers specific questions by combining knowledge graph data with relevant text chunks.Understanding the results
- Response
- Context
- Token usage
The main output is the AI-generated answer to your query, synthesized from the indexed knowledge graph.
Next steps
Custom prompts
Learn how to customize prompts for better domain-specific results
Azure deployment
Deploy GraphRAG with Azure OpenAI and Azure Storage
Configuration
Explore advanced configuration options
Query engine
Deep dive into search methods and parameters
Troubleshooting
Rate limit errors
Rate limit errors
If you encounter rate limit errors:
- Reduce the number of concurrent requests in
settings.yaml - Add rate limiting configuration
- Use a higher-tier API plan
Out of memory errors
Out of memory errors
For large datasets:
- Reduce chunk size in the chunking configuration
- Process documents in smaller batches
- Increase system memory allocation
Empty or poor results
Empty or poor results
To improve query results:
- Run prompt tuning to adapt prompts to your domain
- Verify your input data is properly formatted
- Adjust community detection parameters