Skip to main content
REMem’s retrieval strategies combine dense passage retrieval with graph-based exploration to find relevant context for answering questions. Different extraction methods use different retrieval strategies.

Retrieval Philosophy

Traditional RAG uses dense retrieval alone:
  1. Embed the query
  2. Find k-nearest passages by cosine similarity
  3. Return top-k
REMem enhances this with graph navigation:
  1. Seed selection: Dense retrieval finds initial facts/gists
  2. Graph exploration: Navigate edges to related entities and passages
  3. Ranking fusion: Combine dense scores with graph signals
  4. Passage scoring: Personalized PageRank ranks final passages
This enables multi-hop reasoning that pure dense retrieval misses.

Strategy Architecture

Each extraction method has a corresponding strategy (from rag_strategies/factory.py):
class RAGStrategyFactory:
    @staticmethod
    def create_strategy(extract_method: str, remem_instance):
        if extract_method == "openie":
            return DefaultRAGStrategy(remem_instance)
        elif extract_method in ["episodic_gist"]:
            return EpisodicGistStrategy(remem_instance)
        elif extract_method == "temporal":
            return TemporalStrategy(remem_instance)
        # ...
All strategies inherit from RAGStrategy (base_strategy.py) and implement:
  • index(): Build the graph
  • retrieve_each_query(): Retrieve for a single query
  • rag_for_qa(): Full RAG pipeline (retrieve + answer)

Default Strategy (OpenIE)

The default strategy for openie extraction combines fact retrieval with graph search.

Retrieval Pipeline

Step 1: Query-to-Fact Matching (remem.py:525) Embed the query and find similar facts:
query_triple_scores = self.query_to_triple_scores(query)
# Returns scores for all facts based on embedding similarity
Step 2: Fact Reranking (remem.py:526) Optionally rerank facts using a trained filter:
top_k_triple_indices, top_k_triples, rerank_log = self.rank_triples(query, query_triple_scores)
If no relevant facts are found after reranking:
if len(top_k_triples) == 0:
    logger.info("No triple found after reranking, return DPR results")
    sorted_chunk_ids, sorted_chunk_scores = self.dense_passage_retrieval(query)
Step 3: Graph Search (remem.py:531-538) Navigate from facts to entities to passages:
sorted_chunk_ids, sorted_chunk_scores = self.graph_search_with_fact_entities(
    query=query,
    link_top_k=self.global_config.linking_top_k,
    query_triple_scores=query_triple_scores,
    top_k_triples=top_k_triples,
    top_k_triple_indices=top_k_triple_indices,
    passage_node_weight=self.global_config.passage_node_weight,
)

Graph Search Algorithm

The graph search uses Personalized PageRank to rank passages:
  1. Build seed set: Top-k facts + their entities
  2. Initialize PPR: Set seed weights based on query similarity
  3. Propagate: Random walk with damping through graph edges
  4. Extract passages: Collect passage nodes and their scores
  5. Normalize: Adjust passage scores by passage_node_weight
Key parameters:
  • linking_top_k=5: How many neighbors to explore per node
  • damping=0.5: PPR damping factor (how much weight stays at seed nodes)
  • passage_node_weight=0.05: Multiplicative factor for passage scores

Example Trace

Query: β€œWho proposed the test that Turing created?”
1. Query-to-fact matching:
   Top fact: (Alan Turing, proposed, Turing Test) [score: 0.92]

2. Graph exploration:
   Fact β†’ Entity "Alan Turing" β†’ Entity "Turing Test"
                ↓                          ↓
           Passage 1 [0.85]           Passage 2 [0.78]

3. Passage ranking:
   Passage 1: "Alan Turing proposed the Turing Test in 1950." [final: 0.89]
   Passage 2: "The Turing Test is a measure of machine intelligence." [final: 0.74]

Episodic Gist Strategy

For episodic_gist extraction, the strategy retrieves through gists and verbatim nodes.

Key Differences from Default

  1. Gist-based seeding: Initial retrieval uses gist summaries instead of facts
  2. Multi-level exploration: Navigate through verbatim β†’ gist β†’ fact β†’ entity
  3. Agent-based QA: Uses tool-augmented reasoning for answer generation

Retrieval Pipeline

The episodic gist strategy delegates to an agent-based approach:
# From episodic_gist_strategy.py:875-877
sorted_chunk_ids, sorted_chunk_scores, agent_result = self._rag_each_query(
    remem, query, return_chunk, gold_answer=current_gold_answer, question_metadata=question_metadata_item
)
The agent can use different retrieval tools:
  • semantic_retrieve: Dense search over gists or verbatim
  • lexical_retrieve: BM25 search
  • fact_retrieve: Search over structured facts

Agent Configuration

Two modes for agent-based retrieval: Fixed tools (config: agent_fixed_tools=True):
config = BaseConfig(
    extract_method="episodic_gist",
    agent_fixed_tools=True,
    agent_max_steps=2,  # 1=retrieve only, 2=retrieve+answer
    agent_fixed_retrieval_tool="semantic_retrieve",
)
Agent always uses the specified retrieval tool, then outputs answer. Flexible tools (config: agent_fixed_tools=False):
config = BaseConfig(
    extract_method="episodic_gist",
    agent_fixed_tools=False,
    agent_max_steps=5,  # Up to 5 reasoning steps
)
Agent chooses which tools to use at each step based on the question.

Return Chunk Type

You can retrieve different node types:
# Return verbatim (original text with metadata)
query_solutions, _, _, _, _ = rag.rag_for_qa(
    queries=["What did the user ask about?"],
    return_chunk="verbatim",
)

# Return gists (compressed summaries)
query_solutions, _, _, _, _ = rag.rag_for_qa(
    queries=["What did the user ask about?"],
    return_chunk="gists",
)
From episodic_gist_strategy.py:880-918:
if return_chunk == "verbatim":
    hash_ids_to_fetch = [remem.entry_keys["verbatim"][idx] for idx in limited_chunk_ids]
    chunk_rows = remem.episodic_embedding_stores["verbatim"].get_rows(hash_ids_to_fetch)
    top_k_chunks_content = [row["content"] for row in chunk_rows.values()]
    top_k_chunks_metadata = [row.get("metadata", None) for row in chunk_rows.values()]
elif return_chunk == "gists":
    hash_ids_to_fetch = [remem.entry_keys["gists"][idx] for idx in limited_chunk_ids]
    chunk_rows = remem.episodic_embedding_stores["gists"].get_rows(hash_ids_to_fetch)
    top_k_chunks_content = [row["content"] for row in chunk_rows.values()]
When to use each:
  • verbatim: When you need exact quotes, speaker roles, timestamps
  • gists: When you need compressed context, faster reading for LLM

Parallel Processing

Episodic gist supports parallel query processing:
query_solutions, _, _, _, _ = rag.rag_for_qa(
    queries=queries,
    parallel=True,
    max_workers=8,  # Process 8 queries at once
)
From episodic_gist_strategy.py:653-694:
if parallel:
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_idx = {executor.submit(self._process_single_query, args): args[0] for args in args_list}
        for future in as_completed(future_to_idx):
            q_idx, query_solution, agent_result_dict, agent_answer = future.result()
            # ...

Temporal Strategy

For temporal extraction, the strategy emphasizes temporal reasoning:
  1. Temporal fact retrieval: Facts with time qualifiers are prioritized
  2. Chronological ordering: Results can be sorted by time
  3. Temporal graph edges: Navigate through time-connected events

Configuration Parameters

Control retrieval behavior with these config options:
config = BaseConfig(
    # Retrieval
    retrieval_top_k=200,  # How many passages to retrieve
    linking_top_k=5,  # How many neighbors to explore per node
    damping=0.5,  # PageRank damping factor
    
    # Ranking
    passage_node_weight=0.05,  # Weight for passage nodes in PPR
    
    # QA
    qa_top_k=5,  # How many passages to give to the LLM for answer generation
    qa_passage_prefix="Wikipedia Title: ",  # Prefix for passages in QA prompt
    
    # Agent (for episodic_gist)
    agent_fixed_tools=False,  # Use fixed tools or flexible tool selection?
    agent_max_steps=5,  # Max reasoning steps
    agent_fixed_retrieval_tool="semantic_retrieve",  # Which retrieval tool for fixed mode
)

Retrieval + QA Pipeline

The full RAG pipeline combines retrieval with answer generation:
solutions, responses, meta, retrieval_metrics, qa_metrics = rag.rag_for_qa(
    queries=["Who proposed the Turing Test?"],
    gold_docs=[["passage_123"]],  # For retrieval evaluation
    gold_answers=[["Alan Turing"]],  # For QA evaluation
    metrics=("qa_em", "qa_f1", "retrieval_recall"),
)
Pipeline steps:
  1. Retrieval (if not using pre-retrieved QuerySolution objects):
    query_solutions = self.remem.retrieve(queries=queries)
    
  2. Retrieval evaluation (if gold_docs provided):
    overall_retrieval_metrics = self.remem.evaluate_retrieval(gold_docs, query_solutions, retrieval_evaluators)
    
  3. Answer generation:
    query_solutions, all_response_message, all_metadata = self.remem.qa(query_solutions)
    
  4. QA evaluation (if gold_answers provided):
    overall_qa_metrics = self.remem.evaluate_qa(gold_answers, qa_evaluators, query_solutions, question_metadata)
    
  5. Save results:
    self.remem.save_rag_results(gold_answers, gold_docs, query_solutions, overall_qa_metrics, overall_retrieval_metrics)
    

Per-Sample Evaluation

For episodic gist, you can evaluate each sample as it’s processed:
query_solutions, _, _, _, qa_metrics = rag.rag_for_qa(
    queries=queries,
    gold_answers=gold_answers,
    evaluate_per_sample=True,  # Evaluate each query as it completes
    save_per_sample=True,  # Save each result individually
)
This enables real-time monitoring:
πŸ“Š Sample 0: qa_em: 1.0000, qa_f1: 1.0000 | Avg: qa_em: 1.0000, qa_f1: 1.0000 | Total: 1
πŸ“Š Sample 1: qa_em: 0.0000, qa_f1: 0.6667 | Avg: qa_em: 0.5000, qa_f1: 0.8333 | Total: 2

Dense Passage Retrieval Fallback

If graph search fails (no relevant facts found), REMem falls back to dense passage retrieval:
if len(top_k_triples) == 0:
    logger.info("No triple found after reranking, return DPR results")
    sorted_chunk_ids, sorted_chunk_scores = self.dense_passage_retrieval(query)
This ensures robustness even when extraction misses key information.

Advanced: Custom Retrieval Strategy

You can implement a custom retrieval strategy:
from remem.rag_strategies.base_strategy import RAGStrategy

class CustomStrategy(RAGStrategy):
    def index(self, docs):
        # Custom indexing logic
        pass
    
    def retrieve_each_query(self, query, return_chunk=None):
        # Custom retrieval logic
        # Return: (sorted_chunk_ids, sorted_chunk_scores, metadata)
        pass
    
    def rag_for_qa(self, queries, **kwargs):
        # Custom QA pipeline
        pass
Then use it:
from remem.rag_strategies.factory import RAGStrategyFactory

# Register your strategy
RAGStrategyFactory.register("custom", CustomStrategy)

# Use it
config = BaseConfig(extract_method="custom")
rag = ReMem(global_config=config)

Performance Tuning

For speed:
config = BaseConfig(
    retrieval_top_k=50,  # Reduce from 200
    qa_top_k=3,  # Reduce from 5
    linking_top_k=3,  # Reduce from 5
)
For accuracy:
config = BaseConfig(
    retrieval_top_k=500,  # Increase
    qa_top_k=10,  # Increase
    linking_top_k=10,  # Increase
    damping=0.3,  # Lower damping = more exploration
)
For multi-hop questions:
config = BaseConfig(
    linking_top_k=10,  # More graph exploration
    passage_node_weight=0.01,  # Lower weight = more entity exploration
)

Next Steps

Build docs developers (and LLMs) love