Overview
Community detection groups related entities into clusters based on their relationship patterns. Communities help answer global questions that require understanding broad themes rather than specific facts. For example:- “What are the main topics in this knowledge base?”
- “Tell me about the AI research ecosystem”
- “Summarize the key companies in fintech”
How Community Detection Works
After building the entity graph, communities are detected:lib/arcana/graph/community_detector/leiden.ex:48
Leiden Algorithm
What is Leiden?
The Leiden algorithm is an improvement over the popular Louvain algorithm for community detection. It:- Optimizes modularity - Maximizes connections within communities, minimizes connections between
- Guarantees connectivity - All entities in a community are reachable from each other
- Produces hierarchies - Multiple levels from fine-grained to coarse clusters
- Scales efficiently - Handles large graphs (10,000+ nodes) quickly
Installation
Community detection requires theleidenfold library (Rust NIF):
- macOS (Apple Silicon)
- Linux (x86_64, ARM64)
lib/arcana/graph/community_detector/leiden.ex:11
Configuration
Options
:resolution (float, default: 1.0)
- Controls community granularity
- Higher values → smaller, more focused communities
- Lower values → larger, broader communities
- Typical range: 0.5 - 2.0
:objective (atom, default: :cpm)
- Quality function to optimize
- Options:
:cpm- Constant Potts Model (recommended):modularity- Classic modularity measure:rber- Reichardt-Bornholdt with Erdős-Rényi null model:rbc- Reichardt-Bornholdt with configuration null model:significance- Statistical significance:surprise- Surprise measure
:iterations (integer, default: 2)
- Number of optimization passes
- More iterations = better quality, longer runtime
- Typical range: 1-5
:min_size (integer, default: 1)
- Minimum entities per community
- Set to 2 to exclude singleton communities
- Set to 3+ for more substantial clusters
:max_level (integer, default: 1)
- Maximum hierarchy levels
- Level 0 = finest granularity
- Higher levels = coarser aggregations
- Typical range: 1-5
:seed (integer, default: 0)
- Random seed for reproducibility
- 0 = random seed each run
- Set specific value for deterministic results
lib/arcana/graph/community_detector/leiden.ex:30
Community Summarization
LLM Summarizer (Default)
Generates natural language summaries of communities using LLMs. Configuration:lib/arcana/graph/community_summarizer.ex:84
Summary Format
Good summaries should (2-5 sentences):- Identify the theme - What domain/topic does this community represent?
- Name key entities - Who/what are the most important members?
- Describe relationships - How are entities connected?
- Provide context - Why is this community significant?
Summary Regeneration
Communities are marked “dirty” when modified and need re-summarization:lib/arcana/graph/community_summarizer.ex:140
Custom Detectors
Implement theArcana.Graph.CommunityDetector behaviour:
lib/arcana/graph/community_detector.ex:76
Custom Summarizers
Implement theArcana.Graph.CommunitySummarizer behaviour:
lib/arcana/graph/community_summarizer.ex:74
Real Examples from Source
Example 1: Leiden Detection
Fromlib/arcana/graph/community_detector/leiden.ex:50:
Example 2: Edge Conversion
Fromlib/arcana/graph/community_detector/leiden.ex:123:
Example 3: Hierarchy Formatting
Fromlib/arcana/graph/community_detector/leiden.ex:138:
Example 4: Needs Regeneration Check
Fromlib/arcana/graph/community_summarizer.ex:142:
Using Communities in Search
Communities enable global queries that need broad context:lib/arcana/graph/graph_query.ex:166
Performance Considerations
Leiden Detection:- Small graphs (< 100 nodes): ~10-50ms
- Medium graphs (100-1000 nodes): ~50-500ms
- Large graphs (1000-10000 nodes): ~500-5000ms
- Scales approximately O(n log n)
- ~1-5MB per 1000 nodes
- Edge-weighted graphs use more memory
- Run detection asynchronously during ingest
- Cache community assignments
- Regenerate summaries only when
needs_regeneration?is true - Use higher
min_sizeto reduce number of communities - Limit
max_levelto reduce hierarchy depth
Next Steps
- Search - Use communities in graph search
- Relationships - Communities are built from relationships
- GraphRAG Overview - Understand the full pipeline