Architecture

REMem is built as a layered system that transforms documents into a hybrid memory graph, then uses that graph to answer questions through retrieval-augmented generation (RAG).

System Overview

The architecture consists of distinct layers that work together:

Core Components

Orchestrator

The ReMem class (remem/remem.py) coordinates all operations:

Indexing: Converts documents into the memory graph
Retrieval: Finds relevant passages for queries
QA: Generates answers using retrieved context
Evaluation: Measures performance with various metrics

from remem.remem import ReMem
from remem.utils.config_utils import BaseConfig

config = BaseConfig(
    dataset="sample",
    extract_method="episodic_gist",
    llm_name="gpt-4o-mini",
    embedding_model_name="nvidia/NV-Embed-v2",
)

rag = ReMem(global_config=config)

Preprocessing

The preprocessing layer (graph/preprocessing/) handles:

Document chunking: Splits long documents into manageable pieces
Token-based chunking: Respects token limits for embedding models
Overlap management: Maintains context across chunk boundaries

Key configuration:

config = BaseConfig(
    preprocess_chunk_max_token_size=512,  # Max tokens per chunk
    preprocess_chunk_overlap_token_size=128,  # Overlap between chunks
    preprocess_encoder_name="gpt-4o",  # Tokenizer to use
)

Information Extraction

The extraction layer (information_extraction/) transforms text into structured memory units. The extraction method determines what gets stored in the graph:

openie: Entities and facts (subject-predicate-object triples)
episodic: Episodic facts from conversations or narratives
episodic_gist: Episodic facts + paraphrased gist summaries
temporal: Facts with temporal qualifiers for time-aware QA

See Extraction Methods for details.

Embedding Storage

EmbeddingStore (embedding_store.py) manages vector embeddings for different node types:

Chunk embeddings: Dense vectors for document passages
Entity embeddings: Vectors for named entities
Fact embeddings: Vectors for relational facts
Gist embeddings: Vectors for paraphrased summaries (episodic_gist only)

Embeddings are cached on disk for incremental updates.

Graph Memory

REMem builds a hybrid graph combining:

Nodes: Passages, entities, facts, gists, temporal anchors
Edges: Fact relationships, context links, synonymy connections
Weights: Edge weights encode relationship strength

The graph is stored using igraph and persisted as graph.pkl. See Memory Graph for the graph structure.

Retrieval Strategies

The retrieval layer (rag_strategies/) implements different approaches:

DefaultRAGStrategy: For standard OpenIE extraction
EpisodicGistStrategy: For episodic_gist extraction
TemporalStrategy: For temporal extraction

Each strategy implements:

index(): How to build the graph
retrieve_each_query(): How to find relevant nodes
rag_for_qa(): How to generate answers

See Retrieval Strategies for details.

Prompting

The PromptTemplateManager (prompts/) centralizes all LLM prompts:

Extraction prompts: For information extraction
QA prompts: For answer generation
Dataset-specific templates: Tailored to each benchmark

Templates are stored in prompts/templates/ as text files.

Processing Pipeline

The indexing pipeline transforms documents into the queryable memory graph:

1. Ingestion & Chunking

Documents are split into chunks based on token limits:

rag.index([
    "Alan Turing proposed the Turing Test in 1950.",
    "Grace Hopper pioneered COBOL and popularized the term 'debugging'.",
])

Internally:

Documents are passed to the text preprocessor (remem/remem.py:442)
Chunks are created with configurable overlap
Each chunk gets a unique hash ID

2. Embedding Storage

Chunks are embedded and stored:

# From remem.py:446-448
if len(self.chunk_embedding_store.embeddings) == 0:
    nodes_dict = self.chunk_embedding_store.insert_chunk_dicts(chunk_metadata, "openie")
    self.chunk_contents = [chunk["content"] for chunk in nodes_dict.values()]

3. Information Extraction

The extraction method determines what structure is extracted: For OpenIE (remem.py:354):

ie_results = self.openie.batch_openie(new_openie_rows)
new_ner_results_dict, new_triple_results_dict = ie_results[0], ie_results[1]

For Episodic Gist:

# Extracts gists first, then facts (information_extraction/episodic_gist_extraction_openai.py:34-40)
gist_outputs = self.batch_extraction(chunk_passages, template="episodic_gist_extraction", target="gists")
fact_outputs = self.batch_extraction(chunk_passages, template="episodic_fact_extraction", target="facts", gist_map=gist_map)

4. Memory Graph Build

The graph is constructed with different node and edge types depending on the extraction method: Entities as nodes (remem.py:401-402):

phrase_nodes, chunk_triple_entities = extract_phrase_nodes(chunk_triples)
self.phrase_embedding_store.insert_strings(phrase_nodes)

Facts as nodes (remem.py:404-406):

triple_to_encode = [str(fact) for fact in flattened_triples]
self.triple_embedding_store.insert_strings(triple_to_encode)

Edges connect related nodes (remem.py:418-427):

add_fact_edges(): Entity → Entity edges from triples
add_passage_edges(): Chunk → Entity edges
add_paraphrase_edges(): Chunk → Gist edges (episodic_gist)
add_synonymy_edges_between_phrases(): Entity ↔ Entity similarity edges

5. Retrieval and QA

Queries are processed through the retrieval pipeline:

Initial retrieval: Dense/lexical search for gists and facts
Graph exploration: Navigate edges to find related context
Ranking: Combine signals to rank passages
Answer generation: LLM generates answer from top-k passages

See Retrieval Strategies for the full process.

6. Evaluation

Results are evaluated with multiple metrics:

Retrieval: Recall@k, NDCG
QA: Exact Match, F1, BLEU
LLM-as-judge: For complex reasoning tasks

solutions, responses, meta, retrieval_metrics, qa_metrics = rag.rag_for_qa(
    queries=["Who proposed the Turing Test?"],
    gold_answers=[["Alan Turing"]],
    metrics=("qa_em", "qa_f1", "retrieval_recall")
)

Configuration System

All components are configured through BaseConfig (remem/utils/config_utils.py):

@dataclass
class BaseConfig:
    # LLM settings
    llm_name: str = "gpt-4o-mini"
    max_new_tokens: int = 2048
    temperature: float = 0
    
    # Extraction settings
    extract_method: Literal["openie", "episodic", "episodic_gist", "temporal"] = "openie"
    
    # Embedding settings
    embedding_model_name: str = "nvidia/NV-Embed-v2"
    embedding_batch_size: int = 16
    
    # Graph settings
    synonymy_edge_topk: int = 2047
    synonymy_edge_sim_threshold: float = 0.8
    
    # Retrieval settings
    linking_top_k: int = 5
    retrieval_top_k: int = 200
    damping: float = 0.5
    
    # QA settings
    qa_top_k: int = 5

The config flows through all components, ensuring consistent behavior across the system.

Incremental Updates

REMem supports incremental indexing (remem.py:317-328):

# Graph and embeddings are loaded from disk if they exist
if not self.global_config.force_index_from_scratch:
    if os.path.exists(self._graph_pickle_path):
        loaded_graph = ig.Graph.Read_Pickle(self._graph_pickle_path)

New documents are added to existing structures without rebuilding from scratch.

Set force_index_from_scratch=True to rebuild the entire graph and embeddings.

Next Steps

Learn about the Memory Graph structure
Understand Extraction Methods
Explore Retrieval Strategies

Get Started

Core Concepts

Guides

Customization

Benchmarks

Architecture

System Overview

Core Components

Orchestrator

Preprocessing

Information Extraction

Embedding Storage

Graph Memory

Retrieval Strategies

Prompting

Processing Pipeline

1. Ingestion & Chunking

2. Embedding Storage

3. Information Extraction

4. Memory Graph Build

5. Retrieval and QA

6. Evaluation

Configuration System

Incremental Updates

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Customization

Benchmarks

​System Overview

​Core Components

​Orchestrator

​Preprocessing

​Information Extraction

​Embedding Storage

​Graph Memory

​Retrieval Strategies

​Prompting

​Processing Pipeline

​1. Ingestion & Chunking

​2. Embedding Storage

​3. Information Extraction

​4. Memory Graph Build

​5. Retrieval and QA

​6. Evaluation

​Configuration System

​Incremental Updates

​Next Steps

Build docs developers (and LLMs) love

System Overview

Core Components

Orchestrator

Preprocessing

Information Extraction

Embedding Storage

Graph Memory

Retrieval Strategies

Prompting

Processing Pipeline

1. Ingestion & Chunking

2. Embedding Storage

3. Information Extraction

4. Memory Graph Build

5. Retrieval and QA

6. Evaluation

Configuration System

Incremental Updates

Next Steps