What is a knowledge graph?
A knowledge graph is a structured representation of information where:- Nodes represent entities (people, places, organizations, events, concepts)
- Edges represent relationships between entities
- Attributes provide descriptive information about nodes and edges

An LLM-generated knowledge graph showing entities (circles) sized by degree, with colors representing community membership
Graph extraction process
GraphRAG builds knowledge graphs through a multi-step extraction and refinement pipeline.Entity extraction
Entities are extracted from each text unit using LLM-based analysis. The extraction process identifies:- Entity attributes
- Entity types
- Implementation
Title: The canonical name of the entityType: The category of entity (configurable)Description: Contextual information about the entityText unit references: Links back to source text
Relationship extraction
Relationships connect entities and capture the semantic connections in your data.Relationship structure
Relationship structure
Each relationship contains:Source entity: The starting point of the relationshipTarget entity: The ending point of the relationshipDescription: The nature and context of the relationshipWeight: Strength or importance of the relationship (derived from frequency and context)Text unit IDs: Source references for the relationship
Relationship merging
Relationship merging
When the same relationship appears in multiple text units:
- Collection: All descriptions are gathered into a list
- Deduplication: Identical descriptions are removed
- Summarization: The LLM creates a single concise description capturing all distinct information
- Weight calculation: Frequency and context determine relationship strength
Directional vs undirected
Directional vs undirected
- Extraction: Relationships are extracted with explicit direction (source → target)
- Community detection: The graph is treated as undirected for clustering
- Querying: Both directions are considered when traversing relationships
Entity and relationship summarization
After extraction, entities and relationships often have multiple descriptions from different text units. The summarization phase consolidates these:Collect descriptions
Entities and relationships with the same identity gather all their descriptions from different text units into lists.
LLM summarization
The summarization model receives all descriptions and generates a single concise summary that captures all distinct information:
Summarization is crucial for managing token counts in downstream queries and ensuring each entity/relationship has coherent, non-redundant descriptions.
Claim extraction (covariates)
Beyond entities and relationships, GraphRAG can extract claims—factual statements about entities that may be time-bound.- What are claims?
- When to use claims
- Configuration
Claims (called “covariates” in the data model) are assertions about entities with specific properties:
- Subject: The entity the claim is about
- Object: What is being claimed
- Type: Category of claim
- Status: Validity or confidence level
- Start/End date: Time bounds when applicable
- Description: Full context of the claim
- Source references: Links to supporting text units
Graph properties and metrics
Once extracted, the knowledge graph has several important properties:Entity ranking
Entities are ranked by importance using graph metrics:- Degree: Number of relationships connected to the entity
- Centrality: Position in the network (highly connected entities have higher centrality)
- Community membership: Which communities the entity belongs to at different hierarchy levels
- Prioritization in local search results
- Size of nodes in graph visualizations
- Context window allocation during retrieval
Graph structure
The complete graph structure includes:Connectivity
Entities connected by relationships form a network where information can be traversed through multi-hop paths.
Clustering
Community detection reveals groups of densely connected entities, representing coherent topics or themes.
Hierarchy
Multiple levels of communities create a hierarchical organization from global themes to local clusters.
Provenance
Every entity and relationship maintains links to source text units and documents for verification.
From graph to retrieval
The knowledge graph enables sophisticated retrieval strategies:- Entity-based entry points: Queries identify relevant entities through semantic similarity
- Graph traversal: Related entities and relationships are retrieved by following edges
- Community context: Entities’ community memberships provide broader thematic context
- Multi-hop reasoning: Connections can be followed multiple steps to gather comprehensive information
The next concept page on community detection explores how hierarchical clustering organizes the knowledge graph into meaningful structures.
Best practices
Entity type design
Entity type design
- Start with general types (PERSON, ORGANIZATION, LOCATION)
- Use prompt tuning to identify domain-specific types
- Keep types consistent and well-defined
- Avoid too many types (5-10 is usually sufficient)
Extraction quality
Extraction quality
- Use appropriate text unit sizes (1200 tokens default)
- Configure max_gleanings (1-2) for iterative refinement
- Tune prompts for your domain using the prompt tuning guide
- Review sample extractions before processing large datasets
Graph validation
Graph validation
- Check entity and relationship counts after extraction
- Review high-degree entities for quality
- Validate that relationships make semantic sense
- Ensure text unit references are preserved
Next steps
Indexing pipeline
See how graph extraction fits into the full indexing workflow
Community detection
Learn how hierarchical clustering organizes the graph