Graph Search

Overview

Graph search in GraphRAG combines vector similarity and graph traversal to retrieve more relevant context than traditional RAG. The core technique is Reciprocal Rank Fusion (RRF) - a method for merging ranked lists from multiple sources.

How Graph Search Works

Complete Pipeline

# 1. User asks a question
query = "Tell me about OpenAI's relationship with Microsoft"

# 2. Extract entities from query
{:ok, query_entities} = Arcana.Graph.EntityExtractor.NER.extract(query, [])
# => [%{name: "OpenAI", type: "organization"}, 
#     %{name: "Microsoft", type: "organization"}]

# 3. Run vector search (standard RAG)
vector_results = Arcana.search(query, repo: MyApp.Repo, collection: "docs", top_k: 10)
# => [chunk1, chunk2, chunk3, ...]  (ranked by cosine similarity)

# 4. Run graph search (GraphRAG)
graph_results = Arcana.Graph.search(graph, query_entities, depth: 2)
# => [chunk4, chunk2, chunk5, ...]  (ranked by graph traversal)

# 5. Combine with RRF fusion
final_results = Arcana.Graph.fusion_search(
  graph, 
  query_entities, 
  vector_results,
  depth: 2,
  limit: 10,
  k: 60
)
# => [chunk2, chunk1, chunk4, ...]  (RRF-fused ranking)

See implementation in lib/arcana/graph/fusion_search.ex:129

Graph Search (Without Vector)

Pure graph-based retrieval without vector search:

# Find entities in graph matching query entities
query_entities = [
  %{name: "OpenAI", type: "organization"},
  %{name: "GPT-4", type: "technology"}
]

results = Arcana.Graph.search(graph, query_entities, depth: 2)

How it works:

Find matching entities in the graph by name
Traverse relationships up to depth hops
Collect chunks connected to discovered entities
Return unique chunks containing relevant entities

See lib/arcana/graph/fusion_search.ex:100

Example: 2-hop traversal

Query entity: "OpenAI"

Depth 0: [OpenAI]
          |
Depth 1: [Sam Altman, GPT-4, Microsoft]  (direct relationships)
          |
Depth 2: [Y Combinator, Azure, ChatGPT]  (2nd-degree relationships)

Retrieve all chunks mentioning any of these entities

Fusion Search (Vector + Graph)

Combines vector search and graph search using RRF:

results = Arcana.Graph.fusion_search(
  graph,
  query_entities,
  vector_results,
  depth: 1,    # Graph traversal depth
  limit: 10,   # Maximum results
  k: 60        # RRF constant
)

See lib/arcana/graph/fusion_search.ex:142

Options

:depth (integer, default: 1)

How many relationship hops to traverse
Higher depth = more entities, broader context
Typical range: 1-3

:limit (integer, default: 10)

Maximum number of results to return
Final results after RRF fusion

:k (integer, default: 60)

RRF constant to reduce high-rank impact
Higher k = more balanced fusion
Lower k = favor top-ranked items
Typical range: 10-100

Reciprocal Rank Fusion (RRF)

What is RRF?

RRF is a rank aggregation method that combines multiple ranked lists into a single ranking. Formula:

score(document) = Σ 1 / (k + rank(document, list_i))

Where:

k is a constant (default: 60)
rank(document, list_i) is the position in list i (1-based)
Sum across all lists containing the document

Example Calculation

# Vector search results
vector_results = [chunk_A, chunk_B, chunk_C]

# Graph search results  
graph_results = [chunk_B, chunk_D, chunk_A]

# RRF scores (k=60):
chunk_A: 1/(60+1) + 1/(60+3) = 0.0164 + 0.0159 = 0.0323
chunk_B: 1/(60+2) + 1/(60+1) = 0.0161 + 0.0164 = 0.0325  ← Highest
chunk_C: 1/(60+3) + 0        = 0.0159
chunk_D: 0        + 1/(60+2) = 0.0161

# Final ranking: [chunk_B, chunk_A, chunk_D, chunk_C]

See implementation in lib/arcana/graph/fusion_search.ex:42

Why RRF Works

✅ Promotes agreement - Documents in multiple lists score higher ✅ Robust to outliers - Bad ranking in one list doesn’t eliminate a document ✅ No score normalization - Works with any ranking method (no need to normalize scores) ✅ Simple & effective - Beats weighted averaging in most benchmarks

Graph Traversal

Find related entities by following relationships:

# Start from a specific entity
entity_id = "entity_123"

# Traverse up to 2 hops
related = Arcana.Graph.traverse(graph, entity_id, depth: 2)

# Returns all reachable entities (excluding start entity)

See lib/arcana/graph/graph_query.ex:130

Traversal Algorithm

# Breadth-first search

Depth 0: visited = {start_entity}
         frontier = {start_entity}

Depth 1: neighbors = adjacency[start_entity]
         new_neighbors = neighbors - visited
         visited = visited ∪ new_neighbors
         frontier = new_neighbors

Depth 2: neighbors = ⋃ adjacency[n] for n in frontier
         new_neighbors = neighbors - visited
         visited = visited ∪ new_neighbors
         frontier = new_neighbors

Return: visited - {start_entity}

See lib/arcana/graph/graph_query.ex:193

Finding Entities

By Name

# Exact match (case-insensitive)
entities = Arcana.Graph.find_entities(graph, "OpenAI", fuzzy: false)
# => [%{id: "e1", name: "OpenAI", type: "organization"}]

# Fuzzy match (substring)
entities = Arcana.Graph.find_entities(graph, "Open", fuzzy: true)
# => [%{name: "OpenAI", ...}, %{name: "Open Source Initiative", ...}]

See lib/arcana/graph/graph_query.ex:76

By Embedding

Find entities similar to a query embedding:

{:ok, embedding} = Arcana.Embedder.embed("artificial intelligence")

entities = Arcana.Graph.GraphQuery.find_entities_by_embedding(
  graph,
  embedding,
  top_k: 5,
  min_similarity: 0.7
)
# => [%{name: "Machine Learning", ...}, %{name: "Neural Networks", ...}]

See lib/arcana/graph/graph_query.ex:102

Getting Chunks for Entities

Retrieve all chunks mentioning specific entities:

entity_ids = ["entity_1", "entity_2", "entity_3"]

chunks = Arcana.Graph.GraphQuery.get_chunks_for_entities(graph, entity_ids)
# => [chunk1, chunk2, chunk5, ...]  (unique chunks)

See lib/arcana/graph/graph_query.ex:144

Real Examples from Source

Example 1: RRF Implementation

From lib/arcana/graph/fusion_search.ex:57:

def reciprocal_rank_fusion(lists, opts \\ []) do
  k = Keyword.get(opts, :k, 60)

  # Calculate RRF scores for all documents
  scores =
    lists
    |> Enum.reduce(%{}, fn list, acc ->
      accumulate_rrf_scores(list, k, acc)
    end)

  # Sort by score descending
  scores
  |> Map.values()
  |> Enum.sort_by(fn {_item, score} -> score end, :desc)
  |> Enum.map(fn {item, _score} -> item end)
end

defp accumulate_rrf_scores(list, k, acc) do
  list
  |> Enum.with_index(1)  # 1-based ranking
  |> Enum.reduce(acc, fn {item, rank}, inner_acc ->
    score = 1.0 / (k + rank)
    update_item_score(inner_acc, item, score)
  end)
end

defp update_item_score(scores, item, score) do
  Map.update(scores, item.id, {item, score}, fn {existing_item, existing_score} ->
    {existing_item, existing_score + score}  # Accumulate scores
  end)
end

Example 2: Graph Search

From lib/arcana/graph/fusion_search.ex:100:

def graph_search(graph, entities, opts \\ []) do
  depth = Keyword.get(opts, :depth, 1)

  # 1. Find entities in graph matching extracted entities
  entity_ids =
    entities
    |> Enum.flat_map(fn extracted ->
      matches = GraphQuery.find_entities_by_name(graph, extracted.name, fuzzy: false)
      Enum.map(matches, & &1.id)
    end)
    |> Enum.uniq()

  if entity_ids == [] do
    []
  else
    # 2. Traverse to find related entities
    related_ids =
      entity_ids
      |> Enum.flat_map(fn id ->
        related = GraphQuery.traverse(graph, id, depth: depth)
        [id | Enum.map(related, & &1.id)]  # Include original entity
      end)
      |> Enum.uniq()

    # 3. Get chunks connected to all related entities
    GraphQuery.get_chunks_for_entities(graph, related_ids)
  end
end

Example 3: Fusion Search

From lib/arcana/graph/fusion_search.ex:142:

def search(graph, entities, vector_results, opts \\ []) do
  limit = Keyword.get(opts, :limit, 10)
  depth = Keyword.get(opts, :depth, 1)
  k = Keyword.get(opts, :k, 60)

  # Run graph search
  graph_results = graph_search(graph, entities, depth: depth)

  # Merge using RRF
  reciprocal_rank_fusion([vector_results, graph_results], k: k)
  |> Enum.take(limit)
end

Example 4: BFS Traversal

From lib/arcana/graph/graph_query.ex:193:

defp do_traverse(_graph, visited, _frontier, 0), do: visited

defp do_traverse(graph, visited, frontier, depth) do
  # Get neighbors of all nodes in frontier
  new_neighbors =
    frontier
    |> Enum.flat_map(fn id -> Map.get(graph.adjacency, id, []) end)
    |> MapSet.new()
    |> MapSet.difference(visited)  # Remove already visited

  if MapSet.size(new_neighbors) == 0 do
    visited  # No more neighbors, stop
  else
    new_visited = MapSet.union(visited, new_neighbors)
    do_traverse(graph, new_visited, new_neighbors, depth - 1)
  end
end

Integration with Arcana

GraphRAG integrates seamlessly with standard Arcana operations:

# During ingest (build graph)
Arcana.ingest(text,
  repo: MyApp.Repo,
  collection: "docs",
  graph: true  # Enable GraphRAG
)

# During search (use graph)
defmodule MyApp.Search do
  def search_with_graph(query, opts) do
    # 1. Vector search
    {:ok, vector_results} = Arcana.search(query, opts)
    
    # 2. Load graph from database
    graph = load_graph_from_db(opts[:collection])
    
    # 3. Extract query entities
    {:ok, entities} = Arcana.Graph.EntityExtractor.NER.extract(query, [])
    
    # 4. Fusion search
    if length(entities) > 0 do
      Arcana.Graph.fusion_search(graph, entities, vector_results,
        depth: 2,
        limit: 10
      )
    else
      vector_results  # Fall back to vector-only
    end
  end
  
  defp load_graph_from_db(collection) do
    # Load entities, relationships, chunks from database
    # Build graph structure
  end
end

Performance Considerations

Graph Search:

Small graphs (< 100 entities): ~1-10ms
Medium graphs (100-1000 entities): ~10-100ms
Large graphs (1000-10000 entities): ~100-1000ms
Depth impact: O(depth × avg_degree)

RRF Fusion:

Very fast: O(n log n) where n = total unique documents
Typical: ~1-5ms for 20-100 documents

Optimization Tips:

Cache graph structure - Build once, query many times
Index adjacency lists - Use maps for O(1) neighbor lookup
Limit depth - Depth 1-2 is usually sufficient
Early termination - Stop traversal when enough chunks found
Parallel execution - Run vector and graph search concurrently

When to Use Graph Search

Best Use Cases

✅ Multi-hop questions

“Who works at companies funded by Y Combinator?”
Requires traversing: Person → Company → Investor

✅ Entity-centric queries

“Everything about Sam Altman”
Traverse all relationships from one entity

✅ Relationship exploration

“How is OpenAI connected to Microsoft?”
Find shortest path between entities

✅ Domain with rich entities

Technical docs with many named components
Research papers with authors, institutions, citations

When Vector-Only is Better

❌ Abstract/semantic queries

“What are best practices for caching?”
No specific entities to anchor on

❌ Few entities in query

“How do I configure logging?”
No entities extracted, graph search returns nothing

❌ Unstructured content

Creative writing, narratives
Few named entities or relationships

Next Steps

Entity Extraction - Configure entity extractors
Relationships - Build the graph structure
Communities - Use community summaries for global queries
GraphRAG Overview - Understand the complete pipeline

Core API

Agent Pipeline

GraphRAG

Extensibility

Overview

How Graph Search Works

Complete Pipeline

Graph Search (Without Vector)

How it works:

Example: 2-hop traversal

Fusion Search (Vector + Graph)

Options

Reciprocal Rank Fusion (RRF)

What is RRF?

Example Calculation

Why RRF Works

Graph Traversal

Traversal Algorithm

Finding Entities

By Name

By Embedding

Getting Chunks for Entities

Real Examples from Source

Example 1: RRF Implementation

Example 2: Graph Search

Example 3: Fusion Search

Example 4: BFS Traversal

Integration with Arcana

Performance Considerations

When to Use Graph Search

Best Use Cases

When Vector-Only is Better

Next Steps

Build docs developers (and LLMs) love

Core API

Agent Pipeline

GraphRAG

Extensibility

​Overview

​How Graph Search Works

​Complete Pipeline

​Graph Search (Without Vector)

​How it works:

​Example: 2-hop traversal

​Fusion Search (Vector + Graph)

​Options

​Reciprocal Rank Fusion (RRF)

​What is RRF?

​Example Calculation

​Why RRF Works

​Graph Traversal

​Traversal Algorithm

​Finding Entities

​By Name

​By Embedding

​Getting Chunks for Entities

​Real Examples from Source

​Example 1: RRF Implementation

​Example 2: Graph Search

​Example 3: Fusion Search

​Example 4: BFS Traversal

​Integration with Arcana

​Performance Considerations

​When to Use Graph Search

​Best Use Cases

​When Vector-Only is Better

​Next Steps

Build docs developers (and LLMs) love

Overview

How Graph Search Works

Complete Pipeline

Graph Search (Without Vector)

How it works:

Example: 2-hop traversal

Fusion Search (Vector + Graph)

Options

Reciprocal Rank Fusion (RRF)

What is RRF?

Example Calculation

Why RRF Works

Graph Traversal

Traversal Algorithm

Finding Entities

By Name

By Embedding

Getting Chunks for Entities

Real Examples from Source

Example 1: RRF Implementation

Example 2: Graph Search

Example 3: Fusion Search

Example 4: BFS Traversal

Integration with Arcana

Performance Considerations

When to Use Graph Search

Best Use Cases

When Vector-Only is Better

Next Steps