REMem supports 4 extraction methods that determine what structure is extracted from documents and stored in the memory graph. The extraction method is configured via extract_method in BaseConfig.
Overview
| Method | Structure Extracted | Best For | Graph Nodes |
|---|
openie | Entities + triples | Wikipedia, factual text | Entity, Fact, Passage |
episodic | Episodic facts | Conversations, narratives | Verbatim, Fact, Entity |
episodic_gist | Facts + gist summaries | Long conversations, contextual QA | Verbatim, Gist, Fact, Entity |
temporal | Facts + temporal qualifiers | Time-sensitive questions | Verbatim, Fact, Entity |
OpenIE
Purpose: Extract structured knowledge from factual text like Wikipedia articles.
What it extracts:
- Named entities: Subjects and objects (e.g., “Alan Turing”, “Turing Test”)
- Relation triples: (Subject, Predicate, Object) tuples
Example:
from remem.remem import ReMem
from remem.utils.config_utils import BaseConfig
config = BaseConfig(
extract_method="openie",
llm_name="gpt-4o-mini",
)
rag = ReMem(global_config=config)
rag.index(["Alan Turing proposed the Turing Test in 1950."])
Extracted structure:
{
"entities": ["Alan Turing", "Turing Test", "1950"],
"triples": [
["Alan Turing", "proposed", "Turing Test"],
["Turing Test", "proposed in", "1950"]
]
}
Graph structure:
Entity: "Alan Turing" ──→ Entity: "Turing Test"
↓ ↓
Passage: "Alan Turing proposed..."
Implementation: information_extraction/openie_openai.py
When to use:
- Factual knowledge bases
- Wikipedia-style articles
- Structured question answering (MuSiQue, 2WikiMultihopQA)
OpenIE is the fastest extraction method but provides less context than episodic methods.
Episodic
Purpose: Extract facts from conversations or narrative text while preserving episodic context.
What it extracts:
- Verbatim content: Original conversation text
- Episodic facts: Structured knowledge with context
- Entities: Named entities from facts
Example:
config = BaseConfig(
extract_method="episodic",
llm_name="gpt-4o-mini",
)
rag = ReMem(global_config=config)
rag.index([
"User: Can you help me implement a search feature?\nAssistant: I can help with that."
])
Extracted structure:
{
"verbatim": "User: Can you help me implement a search feature?\\nAssistant: I can help with that.",
"facts": [
{
"subject": "User",
"predicate": "wants to implement",
"object": "search feature",
"qualifiers": {}
},
{
"subject": "Assistant",
"predicate": "offers help with",
"object": "search feature",
"qualifiers": {}
}
]
}
Graph structure:
Verbatim ──→ Entity: "User" ──→ Entity: "search feature"
──→ Entity: "Assistant" ──→ Entity: "search feature"
Implementation: information_extraction/episodic_extraction_openai.py
When to use:
- Conversational data
- Customer support logs
- Narrative question answering
Episodic Gist
Purpose: Combine episodic facts with paraphrased gist summaries for associative recall.
What it extracts:
- Verbatim content: Original text (split by message if conversations)
- Gist summaries: Paraphrased, compressed representations
- Episodic facts: Structured knowledge
- Entities: Named entities from facts
Example:
config = BaseConfig(
extract_method="episodic_gist",
llm_name="gpt-4o-mini",
split_verbatim_per_chunk=True, # Split conversations by message
concatenate_gists_per_chunk=False, # Multiple gist nodes per chunk
)
rag = ReMem(global_config=config)
Extracted structure:
{
"verbatim": "User: Can you help me implement a search feature?",
"gists": [
"User requested help implementing a search feature",
"Conversation about adding search functionality"
],
"facts": [
{
"subject": "User",
"predicate": "wants to implement",
"object": "search feature",
"qualifiers": {"intent": "request_help"}
}
]
}
Graph structure:
Verbatim ──→ Gist: "User requested help..." ──→ Entity: "User"
│ ──→ Fact: (User, wants, search)
└──→ Gist: "Conversation about search" ──→ Entity: "search feature"
Two-stage extraction (episodic_gist_extraction_openai.py:34-40):
# Stage 1: Extract gists
gist_outputs = self.batch_extraction(
chunk_passages, template="episodic_gist_extraction", target="gists"
)
# Stage 2: Extract facts (using gists as context)
fact_outputs = self.batch_extraction(
chunk_passages, template="episodic_fact_extraction", target="facts", gist_map=gist_map
)
Why two stages? Gists provide compressed context that improves fact extraction quality.
Configuration options:
config = BaseConfig(
extract_method="episodic_gist",
# Split long conversations into individual messages?
split_verbatim_per_chunk=True,
# Concatenate all gists into a single node per chunk?
concatenate_gists_per_chunk=False,
)
Implementation: information_extraction/episodic_gist_extraction_openai.py
When to use:
- Long conversations (LongMemEval, LoCoMo)
- Contextual question answering
- When you need both exact quotes and summaries
Episodic gist provides the richest graph structure at the cost of more LLM calls (2x OpenIE).
Temporal
Purpose: Extract facts with temporal qualifiers for time-aware question answering.
What it extracts:
- Verbatim content: Original text
- Temporal facts: Facts with time anchors in qualifiers
- Entities: Named entities from facts
Example:
config = BaseConfig(
extract_method="temporal",
llm_name="gpt-4o-mini",
)
rag = ReMem(global_config=config)
rag.index(["On March 15, 2024, the team deployed version 2.0."])
Extracted structure:
{
"verbatim": "On March 15, 2024, the team deployed version 2.0.",
"facts": [
{
"subject": "team",
"predicate": "deployed",
"object": "version 2.0",
"qualifiers": {
"time": "2024-03-15"
}
}
]
}
Graph structure:
Verbatim ──→ Entity: "team" ──→ Entity: "version 2.0"
↓ ↓
Fact: (team, deployed, version 2.0) {time: 2024-03-15}
Implementation: information_extraction/temporal_extraction_openai.py
When to use:
- Time-series data
- Event logs
- “What happened before/after X?” questions
- Temporal reasoning tasks
Temporal extraction uses specialized prompts that emphasize extracting time anchors.
Decision tree:
Performance considerations:
| Method | LLM Calls per Chunk | Extraction Speed | Graph Density |
|---|
| openie | 1x | Fast | Low |
| episodic | 1x | Fast | Medium |
| episodic_gist | 2x | Moderate | High |
| temporal | 1x | Fast | Medium |
Each method uses specialized prompts from prompts/templates/:
openie_extraction.txt: For standard triple extraction
episodic_fact_extraction_*.txt: For episodic fact extraction (dataset-specific)
episodic_gist_extraction_*.txt: For gist summarization
temporal_extraction.txt: For temporal fact extraction
Prompts are selected based on the dataset (episodic_gist_extraction_openai.py:75-84):
template_name = f"{template}_locomo" # Default
selected = ["menatqa", "timeqa", "musique", "complex_tr", "2wikimultihopqa"]
if any(self.global_config.dataset.startswith(prefix) for prefix in selected):
template_name = f"{template}_wikipedia"
You can change extraction methods without rebuilding:
# Index with openie
config = BaseConfig(extract_method="openie")
rag = ReMem(global_config=config)
rag.index(docs)
# Re-index with episodic_gist (rebuilds from scratch)
config.extract_method = "episodic_gist"
config.force_index_from_scratch = True
rag = ReMem(global_config=config)
rag.index(docs)
Changing extraction methods requires rebuilding the graph with force_index_from_scratch=True.
Implementation Details
All extraction methods inherit from or follow a similar interface:
class ExtractionMethod:
def batch_openie(self, chunks: Dict[str, ChunkInfo]) -> Dict[str, RawOutput]:
"""Extract structure from chunks in parallel."""
pass
The extraction factory in remem.py (lines 130-172) selects the implementation:
if self.global_config.extract_method == "openie":
self.openie = OpenIE(llm_model=self.extract_llm)
elif self.global_config.extract_method == "episodic":
from remem.information_extraction.episodic_extraction_openai import EpisodicExtraction
self.openie = EpisodicExtraction(self.extract_llm, self.global_config)
elif self.global_config.extract_method == "episodic_gist":
from remem.information_extraction.episodic_gist_extraction_openai import EpisodicGistExtraction
self.openie = EpisodicGistExtraction(self.extract_llm, self.global_config)
elif self.global_config.extract_method == "temporal":
from remem.information_extraction.temporal_extraction_openai import TemporalExtraction
self.openie = TemporalExtraction(self.extract_llm, self.global_config)
Next Steps