Skip to main content
REMem supports 4 extraction methods that determine what structure is extracted from documents and stored in the memory graph. The extraction method is configured via extract_method in BaseConfig.

Overview

MethodStructure ExtractedBest ForGraph Nodes
openieEntities + triplesWikipedia, factual textEntity, Fact, Passage
episodicEpisodic factsConversations, narrativesVerbatim, Fact, Entity
episodic_gistFacts + gist summariesLong conversations, contextual QAVerbatim, Gist, Fact, Entity
temporalFacts + temporal qualifiersTime-sensitive questionsVerbatim, Fact, Entity

OpenIE

Purpose: Extract structured knowledge from factual text like Wikipedia articles. What it extracts:
  • Named entities: Subjects and objects (e.g., “Alan Turing”, “Turing Test”)
  • Relation triples: (Subject, Predicate, Object) tuples
Example:
from remem.remem import ReMem
from remem.utils.config_utils import BaseConfig

config = BaseConfig(
    extract_method="openie",
    llm_name="gpt-4o-mini",
)

rag = ReMem(global_config=config)
rag.index(["Alan Turing proposed the Turing Test in 1950."])
Extracted structure:
{
  "entities": ["Alan Turing", "Turing Test", "1950"],
  "triples": [
    ["Alan Turing", "proposed", "Turing Test"],
    ["Turing Test", "proposed in", "1950"]
  ]
}
Graph structure:
Entity: "Alan Turing" ──→ Entity: "Turing Test"
        ↓                          ↓
    Passage: "Alan Turing proposed..."
Implementation: information_extraction/openie_openai.py When to use:
  • Factual knowledge bases
  • Wikipedia-style articles
  • Structured question answering (MuSiQue, 2WikiMultihopQA)
OpenIE is the fastest extraction method but provides less context than episodic methods.

Episodic

Purpose: Extract facts from conversations or narrative text while preserving episodic context. What it extracts:
  • Verbatim content: Original conversation text
  • Episodic facts: Structured knowledge with context
  • Entities: Named entities from facts
Example:
config = BaseConfig(
    extract_method="episodic",
    llm_name="gpt-4o-mini",
)

rag = ReMem(global_config=config)
rag.index([
    "User: Can you help me implement a search feature?\nAssistant: I can help with that."
])
Extracted structure:
{
  "verbatim": "User: Can you help me implement a search feature?\\nAssistant: I can help with that.",
  "facts": [
    {
      "subject": "User",
      "predicate": "wants to implement",
      "object": "search feature",
      "qualifiers": {}
    },
    {
      "subject": "Assistant",
      "predicate": "offers help with",
      "object": "search feature",
      "qualifiers": {}
    }
  ]
}
Graph structure:
Verbatim ──→ Entity: "User" ──→ Entity: "search feature"
         ──→ Entity: "Assistant" ──→ Entity: "search feature"
Implementation: information_extraction/episodic_extraction_openai.py When to use:
  • Conversational data
  • Customer support logs
  • Narrative question answering

Episodic Gist

Purpose: Combine episodic facts with paraphrased gist summaries for associative recall. What it extracts:
  • Verbatim content: Original text (split by message if conversations)
  • Gist summaries: Paraphrased, compressed representations
  • Episodic facts: Structured knowledge
  • Entities: Named entities from facts
Example:
config = BaseConfig(
    extract_method="episodic_gist",
    llm_name="gpt-4o-mini",
    split_verbatim_per_chunk=True,  # Split conversations by message
    concatenate_gists_per_chunk=False,  # Multiple gist nodes per chunk
)

rag = ReMem(global_config=config)
Extracted structure:
{
  "verbatim": "User: Can you help me implement a search feature?",
  "gists": [
    "User requested help implementing a search feature",
    "Conversation about adding search functionality"
  ],
  "facts": [
    {
      "subject": "User",
      "predicate": "wants to implement",
      "object": "search feature",
      "qualifiers": {"intent": "request_help"}
    }
  ]
}
Graph structure:
Verbatim ──→ Gist: "User requested help..." ──→ Entity: "User"
         │                                   ──→ Fact: (User, wants, search)
         └──→ Gist: "Conversation about search" ──→ Entity: "search feature"
Two-stage extraction (episodic_gist_extraction_openai.py:34-40):
# Stage 1: Extract gists
gist_outputs = self.batch_extraction(
    chunk_passages, template="episodic_gist_extraction", target="gists"
)

# Stage 2: Extract facts (using gists as context)
fact_outputs = self.batch_extraction(
    chunk_passages, template="episodic_fact_extraction", target="facts", gist_map=gist_map
)
Why two stages? Gists provide compressed context that improves fact extraction quality. Configuration options:
config = BaseConfig(
    extract_method="episodic_gist",
    
    # Split long conversations into individual messages?
    split_verbatim_per_chunk=True,
    
    # Concatenate all gists into a single node per chunk?
    concatenate_gists_per_chunk=False,
)
Implementation: information_extraction/episodic_gist_extraction_openai.py When to use:
  • Long conversations (LongMemEval, LoCoMo)
  • Contextual question answering
  • When you need both exact quotes and summaries
Episodic gist provides the richest graph structure at the cost of more LLM calls (2x OpenIE).

Temporal

Purpose: Extract facts with temporal qualifiers for time-aware question answering. What it extracts:
  • Verbatim content: Original text
  • Temporal facts: Facts with time anchors in qualifiers
  • Entities: Named entities from facts
Example:
config = BaseConfig(
    extract_method="temporal",
    llm_name="gpt-4o-mini",
)

rag = ReMem(global_config=config)
rag.index(["On March 15, 2024, the team deployed version 2.0."])
Extracted structure:
{
  "verbatim": "On March 15, 2024, the team deployed version 2.0.",
  "facts": [
    {
      "subject": "team",
      "predicate": "deployed",
      "object": "version 2.0",
      "qualifiers": {
        "time": "2024-03-15"
      }
    }
  ]
}
Graph structure:
Verbatim ──→ Entity: "team" ──→ Entity: "version 2.0"
                ↓                    ↓
         Fact: (team, deployed, version 2.0) {time: 2024-03-15}
Implementation: information_extraction/temporal_extraction_openai.py When to use:
  • Time-series data
  • Event logs
  • “What happened before/after X?” questions
  • Temporal reasoning tasks
Temporal extraction uses specialized prompts that emphasize extracting time anchors.

Choosing an Extraction Method

Decision tree: Performance considerations:
MethodLLM Calls per ChunkExtraction SpeedGraph Density
openie1xFastLow
episodic1xFastMedium
episodic_gist2xModerateHigh
temporal1xFastMedium

Extraction Prompts

Each method uses specialized prompts from prompts/templates/:
  • openie_extraction.txt: For standard triple extraction
  • episodic_fact_extraction_*.txt: For episodic fact extraction (dataset-specific)
  • episodic_gist_extraction_*.txt: For gist summarization
  • temporal_extraction.txt: For temporal fact extraction
Prompts are selected based on the dataset (episodic_gist_extraction_openai.py:75-84):
template_name = f"{template}_locomo"  # Default
selected = ["menatqa", "timeqa", "musique", "complex_tr", "2wikimultihopqa"]
if any(self.global_config.dataset.startswith(prefix) for prefix in selected):
    template_name = f"{template}_wikipedia"

Switching Extraction Methods

You can change extraction methods without rebuilding:
# Index with openie
config = BaseConfig(extract_method="openie")
rag = ReMem(global_config=config)
rag.index(docs)

# Re-index with episodic_gist (rebuilds from scratch)
config.extract_method = "episodic_gist"
config.force_index_from_scratch = True
rag = ReMem(global_config=config)
rag.index(docs)
Changing extraction methods requires rebuilding the graph with force_index_from_scratch=True.

Implementation Details

All extraction methods inherit from or follow a similar interface:
class ExtractionMethod:
    def batch_openie(self, chunks: Dict[str, ChunkInfo]) -> Dict[str, RawOutput]:
        """Extract structure from chunks in parallel."""
        pass
The extraction factory in remem.py (lines 130-172) selects the implementation:
if self.global_config.extract_method == "openie":
    self.openie = OpenIE(llm_model=self.extract_llm)
elif self.global_config.extract_method == "episodic":
    from remem.information_extraction.episodic_extraction_openai import EpisodicExtraction
    self.openie = EpisodicExtraction(self.extract_llm, self.global_config)
elif self.global_config.extract_method == "episodic_gist":
    from remem.information_extraction.episodic_gist_extraction_openai import EpisodicGistExtraction
    self.openie = EpisodicGistExtraction(self.extract_llm, self.global_config)
elif self.global_config.extract_method == "temporal":
    from remem.information_extraction.temporal_extraction_openai import TemporalExtraction
    self.openie = TemporalExtraction(self.extract_llm, self.global_config)

Next Steps

Build docs developers (and LLMs) love