Knowledge model
In order to support the GraphRAG system, the outputs of the indexing engine (in default configuration mode) are aligned to a knowledge model called the GraphRAG Knowledge Model.This model is designed to be an abstraction over the underlying data storage technology and provides a common interface for the GraphRAG system to interact with.
- Document - An input document into the system (CSV rows or .txt files)
- TextUnit - A chunk of text to analyze
- Entity - An entity extracted from a TextUnit (people, places, events)
- Relationship - A relationship between two entities
- Covariate - Extracted claim information with time-bound statements
- Community - Hierarchical clustering of entities and relationships
- Community Report - Generated summaries of each community
Workflows
The indexing pipeline is composed of individual workflows that execute in sequence. Below is the core GraphRAG indexing pipeline:Individual workflows are described in detail on the dataflow page.
LLM caching
The GraphRAG library was designed with LLM interactions in mind. A common setback when working with LLM APIs is various errors due to network latency, throttling, etc.How it works
When completion requests are made using the same input set (prompt and tuning parameters), the system returns a cached result if one exists. This allows the indexer to be:
- Resilient to network issues
- Idempotent across runs
- Efficient for end-users
Providers and factories
Several subsystems within GraphRAG use a factory pattern to register and retrieve provider implementations. This allows deep customization to support your own implementations of models, storage, and more.Available factories
The following subsystems use a factory pattern that allows you to register your own implementations:Language model
Language model
Implement your own
chat and embed methods to use a model provider beyond the built-in LiteLLM wrapper.Input reader
Input reader
Implement your own input document reader to support file types other than text, CSV, and JSON.
Cache
Cache
Create your own cache storage location in addition to file, blob, and CosmosDB providers.
Logger
Logger
Create your own log writing location in addition to built-in file and blob storage.
Storage
Storage
Create your own storage provider (database, etc.) beyond file, blob, and CosmosDB.
Vector store
Vector store
Implement your own vector store beyond LanceDB, Azure AI Search, and CosmosDB.
Pipeline + workflows
Pipeline + workflows
Implement your own workflow steps with a custom
run_workflow function, or register an entire pipeline.Pipeline factory implementation
Here’s how the pipeline factory works under the hood:Default workflow configurations
- Standard
- Fast
Next steps
Data flow
Learn how data flows through the indexing pipeline
Custom graphs
Bring your own existing graph data