Skip to main content
REMem is built as a layered system that transforms documents into a hybrid memory graph, then uses that graph to answer questions through retrieval-augmented generation (RAG).

System Overview

The architecture consists of distinct layers that work together:

Core Components

Orchestrator

The ReMem class (remem/remem.py) coordinates all operations:
  • Indexing: Converts documents into the memory graph
  • Retrieval: Finds relevant passages for queries
  • QA: Generates answers using retrieved context
  • Evaluation: Measures performance with various metrics
from remem.remem import ReMem
from remem.utils.config_utils import BaseConfig

config = BaseConfig(
    dataset="sample",
    extract_method="episodic_gist",
    llm_name="gpt-4o-mini",
    embedding_model_name="nvidia/NV-Embed-v2",
)

rag = ReMem(global_config=config)

Preprocessing

The preprocessing layer (graph/preprocessing/) handles:
  • Document chunking: Splits long documents into manageable pieces
  • Token-based chunking: Respects token limits for embedding models
  • Overlap management: Maintains context across chunk boundaries
Key configuration:
config = BaseConfig(
    preprocess_chunk_max_token_size=512,  # Max tokens per chunk
    preprocess_chunk_overlap_token_size=128,  # Overlap between chunks
    preprocess_encoder_name="gpt-4o",  # Tokenizer to use
)

Information Extraction

The extraction layer (information_extraction/) transforms text into structured memory units. The extraction method determines what gets stored in the graph:
  • openie: Entities and facts (subject-predicate-object triples)
  • episodic: Episodic facts from conversations or narratives
  • episodic_gist: Episodic facts + paraphrased gist summaries
  • temporal: Facts with temporal qualifiers for time-aware QA
See Extraction Methods for details.

Embedding Storage

EmbeddingStore (embedding_store.py) manages vector embeddings for different node types:
  • Chunk embeddings: Dense vectors for document passages
  • Entity embeddings: Vectors for named entities
  • Fact embeddings: Vectors for relational facts
  • Gist embeddings: Vectors for paraphrased summaries (episodic_gist only)
Embeddings are cached on disk for incremental updates.

Graph Memory

REMem builds a hybrid graph combining:
  • Nodes: Passages, entities, facts, gists, temporal anchors
  • Edges: Fact relationships, context links, synonymy connections
  • Weights: Edge weights encode relationship strength
The graph is stored using igraph and persisted as graph.pkl. See Memory Graph for the graph structure.

Retrieval Strategies

The retrieval layer (rag_strategies/) implements different approaches:
  • DefaultRAGStrategy: For standard OpenIE extraction
  • EpisodicGistStrategy: For episodic_gist extraction
  • TemporalStrategy: For temporal extraction
Each strategy implements:
  • index(): How to build the graph
  • retrieve_each_query(): How to find relevant nodes
  • rag_for_qa(): How to generate answers
See Retrieval Strategies for details.

Prompting

The PromptTemplateManager (prompts/) centralizes all LLM prompts:
  • Extraction prompts: For information extraction
  • QA prompts: For answer generation
  • Dataset-specific templates: Tailored to each benchmark
Templates are stored in prompts/templates/ as text files.

Processing Pipeline

The indexing pipeline transforms documents into the queryable memory graph:

1. Ingestion & Chunking

Documents are split into chunks based on token limits:
rag.index([
    "Alan Turing proposed the Turing Test in 1950.",
    "Grace Hopper pioneered COBOL and popularized the term 'debugging'.",
])
Internally:
  1. Documents are passed to the text preprocessor (remem/remem.py:442)
  2. Chunks are created with configurable overlap
  3. Each chunk gets a unique hash ID

2. Embedding Storage

Chunks are embedded and stored:
# From remem.py:446-448
if len(self.chunk_embedding_store.embeddings) == 0:
    nodes_dict = self.chunk_embedding_store.insert_chunk_dicts(chunk_metadata, "openie")
    self.chunk_contents = [chunk["content"] for chunk in nodes_dict.values()]

3. Information Extraction

The extraction method determines what structure is extracted: For OpenIE (remem.py:354):
ie_results = self.openie.batch_openie(new_openie_rows)
new_ner_results_dict, new_triple_results_dict = ie_results[0], ie_results[1]
For Episodic Gist:
# Extracts gists first, then facts (information_extraction/episodic_gist_extraction_openai.py:34-40)
gist_outputs = self.batch_extraction(chunk_passages, template="episodic_gist_extraction", target="gists")
fact_outputs = self.batch_extraction(chunk_passages, template="episodic_fact_extraction", target="facts", gist_map=gist_map)

4. Memory Graph Build

The graph is constructed with different node and edge types depending on the extraction method: Entities as nodes (remem.py:401-402):
phrase_nodes, chunk_triple_entities = extract_phrase_nodes(chunk_triples)
self.phrase_embedding_store.insert_strings(phrase_nodes)
Facts as nodes (remem.py:404-406):
triple_to_encode = [str(fact) for fact in flattened_triples]
self.triple_embedding_store.insert_strings(triple_to_encode)
Edges connect related nodes (remem.py:418-427):
  • add_fact_edges(): Entity → Entity edges from triples
  • add_passage_edges(): Chunk → Entity edges
  • add_paraphrase_edges(): Chunk → Gist edges (episodic_gist)
  • add_synonymy_edges_between_phrases(): Entity ↔ Entity similarity edges

5. Retrieval and QA

Queries are processed through the retrieval pipeline:
  1. Initial retrieval: Dense/lexical search for gists and facts
  2. Graph exploration: Navigate edges to find related context
  3. Ranking: Combine signals to rank passages
  4. Answer generation: LLM generates answer from top-k passages
See Retrieval Strategies for the full process.

6. Evaluation

Results are evaluated with multiple metrics:
  • Retrieval: Recall@k, NDCG
  • QA: Exact Match, F1, BLEU
  • LLM-as-judge: For complex reasoning tasks
solutions, responses, meta, retrieval_metrics, qa_metrics = rag.rag_for_qa(
    queries=["Who proposed the Turing Test?"],
    gold_answers=[["Alan Turing"]],
    metrics=("qa_em", "qa_f1", "retrieval_recall")
)

Configuration System

All components are configured through BaseConfig (remem/utils/config_utils.py):
@dataclass
class BaseConfig:
    # LLM settings
    llm_name: str = "gpt-4o-mini"
    max_new_tokens: int = 2048
    temperature: float = 0
    
    # Extraction settings
    extract_method: Literal["openie", "episodic", "episodic_gist", "temporal"] = "openie"
    
    # Embedding settings
    embedding_model_name: str = "nvidia/NV-Embed-v2"
    embedding_batch_size: int = 16
    
    # Graph settings
    synonymy_edge_topk: int = 2047
    synonymy_edge_sim_threshold: float = 0.8
    
    # Retrieval settings
    linking_top_k: int = 5
    retrieval_top_k: int = 200
    damping: float = 0.5
    
    # QA settings
    qa_top_k: int = 5
The config flows through all components, ensuring consistent behavior across the system.

Incremental Updates

REMem supports incremental indexing (remem.py:317-328):
# Graph and embeddings are loaded from disk if they exist
if not self.global_config.force_index_from_scratch:
    if os.path.exists(self._graph_pickle_path):
        loaded_graph = ig.Graph.Read_Pickle(self._graph_pickle_path)
New documents are added to existing structures without rebuilding from scratch.
Set force_index_from_scratch=True to rebuild the entire graph and embeddings.

Next Steps

Build docs developers (and LLMs) love