System Overview
The architecture consists of distinct layers that work together:Core Components
Orchestrator
TheReMem class (remem/remem.py) coordinates all operations:
- Indexing: Converts documents into the memory graph
- Retrieval: Finds relevant passages for queries
- QA: Generates answers using retrieved context
- Evaluation: Measures performance with various metrics
Preprocessing
The preprocessing layer (graph/preprocessing/) handles:
- Document chunking: Splits long documents into manageable pieces
- Token-based chunking: Respects token limits for embedding models
- Overlap management: Maintains context across chunk boundaries
Information Extraction
The extraction layer (information_extraction/) transforms text into structured memory units. The extraction method determines what gets stored in the graph:
- openie: Entities and facts (subject-predicate-object triples)
- episodic: Episodic facts from conversations or narratives
- episodic_gist: Episodic facts + paraphrased gist summaries
- temporal: Facts with temporal qualifiers for time-aware QA
Embedding Storage
EmbeddingStore (embedding_store.py) manages vector embeddings for different node types:
- Chunk embeddings: Dense vectors for document passages
- Entity embeddings: Vectors for named entities
- Fact embeddings: Vectors for relational facts
- Gist embeddings: Vectors for paraphrased summaries (episodic_gist only)
Graph Memory
REMem builds a hybrid graph combining:- Nodes: Passages, entities, facts, gists, temporal anchors
- Edges: Fact relationships, context links, synonymy connections
- Weights: Edge weights encode relationship strength
igraph and persisted as graph.pkl.
See Memory Graph for the graph structure.
Retrieval Strategies
The retrieval layer (rag_strategies/) implements different approaches:
- DefaultRAGStrategy: For standard OpenIE extraction
- EpisodicGistStrategy: For episodic_gist extraction
- TemporalStrategy: For temporal extraction
index(): How to build the graphretrieve_each_query(): How to find relevant nodesrag_for_qa(): How to generate answers
Prompting
ThePromptTemplateManager (prompts/) centralizes all LLM prompts:
- Extraction prompts: For information extraction
- QA prompts: For answer generation
- Dataset-specific templates: Tailored to each benchmark
prompts/templates/ as text files.
Processing Pipeline
The indexing pipeline transforms documents into the queryable memory graph:1. Ingestion & Chunking
Documents are split into chunks based on token limits:- Documents are passed to the text preprocessor (remem/remem.py:442)
- Chunks are created with configurable overlap
- Each chunk gets a unique hash ID
2. Embedding Storage
Chunks are embedded and stored:3. Information Extraction
The extraction method determines what structure is extracted: For OpenIE (remem.py:354):4. Memory Graph Build
The graph is constructed with different node and edge types depending on the extraction method: Entities as nodes (remem.py:401-402):add_fact_edges(): Entity → Entity edges from triplesadd_passage_edges(): Chunk → Entity edgesadd_paraphrase_edges(): Chunk → Gist edges (episodic_gist)add_synonymy_edges_between_phrases(): Entity ↔ Entity similarity edges
5. Retrieval and QA
Queries are processed through the retrieval pipeline:- Initial retrieval: Dense/lexical search for gists and facts
- Graph exploration: Navigate edges to find related context
- Ranking: Combine signals to rank passages
- Answer generation: LLM generates answer from top-k passages
6. Evaluation
Results are evaluated with multiple metrics:- Retrieval: Recall@k, NDCG
- QA: Exact Match, F1, BLEU
- LLM-as-judge: For complex reasoning tasks
Configuration System
All components are configured throughBaseConfig (remem/utils/config_utils.py):
Incremental Updates
REMem supports incremental indexing (remem.py:317-328):Set
force_index_from_scratch=True to rebuild the entire graph and embeddings.Next Steps
- Learn about the Memory Graph structure
- Understand Extraction Methods
- Explore Retrieval Strategies