Skip to main content

What is GraphRAG?

GraphRAG (Graph-enhanced Retrieval Augmented Generation) extends traditional RAG by building a knowledge graph from your documents. Instead of relying solely on vector similarity, GraphRAG:
  • Extracts entities (people, organizations, locations, concepts) from text
  • Identifies relationships between entities using LLMs or patterns
  • Detects communities of related entities using the Leiden algorithm
  • Combines graph and vector search using Reciprocal Rank Fusion (RRF)
This approach improves retrieval accuracy by leveraging the semantic structure of your knowledge.

Architecture

GraphRAG consists of several modular components:
┌─────────────────────────────────────────────────────────────┐
│                      GraphRAG Pipeline                       │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  1. Entity Extraction                                         │
│     ├─ NER (Bumblebee distilbert-NER)          [Default]     │
│     └─ LLM-based extraction                    [Optional]     │
│                                                               │
│  2. Relationship Extraction                                   │
│     ├─ LLM-based extraction                    [Default]     │
│     ├─ Co-occurrence patterns                  [Optional]     │
│     └─ Custom patterns                         [Optional]     │
│                                                               │
│  3. Community Detection                                       │
│     └─ Leiden algorithm (via Rust NIF)         [Optional]     │
│                                                               │
│  4. Community Summarization                                   │
│     └─ LLM-based summarization                 [Optional]     │
│                                                               │
│  5. Fusion Search                                             │
│     └─ RRF: Vector + Graph → Ranked Results                   │
│                                                               │
└─────────────────────────────────────────────────────────────┘

Data Flow

During Ingest:
DocumentChunksEntitiesRelationshipsCommunitiesDatabase
During Search:
Query → [Vector Search] → Results
      → [Entity Extraction] → Graph TraversalResults

                    RRF FusionFinal Results

When to Use GraphRAG

Best Use Cases

Multi-hop questions: “Who works at companies founded by Y Combinator alumni?” Relationship queries: “What is the connection between OpenAI and Microsoft?” Entity-centric search: “Tell me everything about Sam Altman” Domain knowledge: Technical documentation with many named concepts Global understanding: Questions requiring broad context (use community summaries)

When Vector Search Alone is Better

Unstructured content: Pure creative writing, narratives without entities Simple semantic search: “What are best practices for caching?” Cost-sensitive: GraphRAG requires extra LLM calls and compute Low-entity documents: Content with few named entities or relationships

Installation

GraphRAG is optional and requires separate installation:
# Install graph dependencies
mix arcana.graph.install

# Run migrations
mix ecto.migrate
For community detection, add leidenfold to your dependencies:
defp deps do
  [
    {:arcana, "~> 1.2"},
    {:leidenfold, "~> 0.2"}  # Optional: for community detection
  ]
end

Configuration

Enable GraphRAG globally or per-call:
# config/config.exs
config :arcana,
  graph: [
    enabled: true,
    community_levels: 5,
    resolution: 1.0,
    # Optional: configure extractors
    entity_extractor: :ner,  # or {MyApp.CustomExtractor, opts}
    relationship_extractor: {Arcana.Graph.RelationshipExtractor.LLM, []},
    community_detector: {Arcana.Graph.CommunityDetector.Leiden, resolution: 1.0}
  ]
Add the NER serving to your supervision tree:
children = [
  MyApp.Repo,
  Arcana.Embedder.Local,
  Arcana.Graph.NERServing  # For entity extraction
]

Main Functions

Building Graphs

# Build graph from chunks
{:ok, graph_data} = Arcana.Graph.build(chunks,
  entity_extractor: &MyApp.extract_entities/2,
  relationship_extractor: &MyApp.extract_relationships/3
)

# Convert to queryable format
graph = Arcana.Graph.to_query_graph(graph_data, chunks)
See lib/arcana/graph.ex:150 for implementation details.

Searching Graphs

# Graph-only search
entities = [%{name: "OpenAI", type: :organization}]
results = Arcana.Graph.search(graph, entities, depth: 2)

# Fusion search (combines vector + graph)
results = Arcana.Graph.fusion_search(graph, entities, vector_results,
  depth: 1,
  limit: 10,
  k: 60
)
See Graph Search for detailed documentation.

Community Summaries

# Get all top-level summaries
summaries = Arcana.Graph.community_summaries(graph, level: 0)

# Get summaries for a specific entity
summaries = Arcana.Graph.community_summaries(graph, entity_id: "entity_123")
See Communities for detailed documentation.

Finding and Traversing

# Find entities by name
entities = Arcana.Graph.find_entities(graph, "OpenAI", fuzzy: false)

# Traverse relationships
related = Arcana.Graph.traverse(graph, entity_id, depth: 2)

Integration with Ingest

GraphRAG automatically integrates with Arcana.ingest/2:
# Enable graph building during ingest
Arcana.ingest(text,
  repo: MyApp.Repo,
  collection: "docs",
  graph: true,  # Enable GraphRAG
  progress: fn current, total ->
    IO.puts("Processed chunk #{current}/#{total}")
  end
)
This will:
  1. Extract entities from each chunk using NER or LLM
  2. Extract relationships between entities
  3. Persist entities, relationships, and mentions to the database
  4. Optionally detect communities and generate summaries

Next Steps

Build docs developers (and LLMs) love