Extraction Methods

REMem supports 4 extraction methods that determine what structure is extracted from documents and stored in the memory graph. The extraction method is configured via extract_method in BaseConfig.

Overview

Method	Structure Extracted	Best For	Graph Nodes
`openie`	Entities + triples	Wikipedia, factual text	Entity, Fact, Passage
`episodic`	Episodic facts	Conversations, narratives	Verbatim, Fact, Entity
`episodic_gist`	Facts + gist summaries	Long conversations, contextual QA	Verbatim, Gist, Fact, Entity
`temporal`	Facts + temporal qualifiers	Time-sensitive questions	Verbatim, Fact, Entity

OpenIE

Purpose: Extract structured knowledge from factual text like Wikipedia articles. What it extracts:

Named entities: Subjects and objects (e.g., “Alan Turing”, “Turing Test”)
Relation triples: (Subject, Predicate, Object) tuples

Example:

from remem.remem import ReMem
from remem.utils.config_utils import BaseConfig

config = BaseConfig(
    extract_method="openie",
    llm_name="gpt-4o-mini",
)

rag = ReMem(global_config=config)
rag.index(["Alan Turing proposed the Turing Test in 1950."])

Extracted structure:

{
  "entities": ["Alan Turing", "Turing Test", "1950"],
  "triples": [
    ["Alan Turing", "proposed", "Turing Test"],
    ["Turing Test", "proposed in", "1950"]
  ]
}

Graph structure:

Entity: "Alan Turing" ──→ Entity: "Turing Test"
        ↓                          ↓
    Passage: "Alan Turing proposed..."

Implementation: information_extraction/openie_openai.py When to use:

Factual knowledge bases
Wikipedia-style articles
Structured question answering (MuSiQue, 2WikiMultihopQA)

OpenIE is the fastest extraction method but provides less context than episodic methods.

Episodic

Purpose: Extract facts from conversations or narrative text while preserving episodic context. What it extracts:

Verbatim content: Original conversation text
Episodic facts: Structured knowledge with context
Entities: Named entities from facts

Example:

config = BaseConfig(
    extract_method="episodic",
    llm_name="gpt-4o-mini",
)

rag = ReMem(global_config=config)
rag.index([
    "User: Can you help me implement a search feature?\nAssistant: I can help with that."
])

Extracted structure:

{
  "verbatim": "User: Can you help me implement a search feature?\\nAssistant: I can help with that.",
  "facts": [
    {
      "subject": "User",
      "predicate": "wants to implement",
      "object": "search feature",
      "qualifiers": {}
    },
    {
      "subject": "Assistant",
      "predicate": "offers help with",
      "object": "search feature",
      "qualifiers": {}
    }
  ]
}

Graph structure:

Verbatim ──→ Entity: "User" ──→ Entity: "search feature"
         ──→ Entity: "Assistant" ──→ Entity: "search feature"

Implementation: information_extraction/episodic_extraction_openai.py When to use:

Conversational data
Customer support logs
Narrative question answering

Episodic Gist

Purpose: Combine episodic facts with paraphrased gist summaries for associative recall. What it extracts:

Verbatim content: Original text (split by message if conversations)
Gist summaries: Paraphrased, compressed representations
Episodic facts: Structured knowledge
Entities: Named entities from facts

Example:

config = BaseConfig(
    extract_method="episodic_gist",
    llm_name="gpt-4o-mini",
    split_verbatim_per_chunk=True,  # Split conversations by message
    concatenate_gists_per_chunk=False,  # Multiple gist nodes per chunk
)

rag = ReMem(global_config=config)

Extracted structure:

{
  "verbatim": "User: Can you help me implement a search feature?",
  "gists": [
    "User requested help implementing a search feature",
    "Conversation about adding search functionality"
  ],
  "facts": [
    {
      "subject": "User",
      "predicate": "wants to implement",
      "object": "search feature",
      "qualifiers": {"intent": "request_help"}
    }
  ]
}

Graph structure:

Verbatim ──→ Gist: "User requested help..." ──→ Entity: "User"
         │                                   ──→ Fact: (User, wants, search)
         └──→ Gist: "Conversation about search" ──→ Entity: "search feature"

Two-stage extraction (episodic_gist_extraction_openai.py:34-40):

# Stage 1: Extract gists
gist_outputs = self.batch_extraction(
    chunk_passages, template="episodic_gist_extraction", target="gists"
)

# Stage 2: Extract facts (using gists as context)
fact_outputs = self.batch_extraction(
    chunk_passages, template="episodic_fact_extraction", target="facts", gist_map=gist_map
)

Why two stages? Gists provide compressed context that improves fact extraction quality. Configuration options:

config = BaseConfig(
    extract_method="episodic_gist",
    
    # Split long conversations into individual messages?
    split_verbatim_per_chunk=True,
    
    # Concatenate all gists into a single node per chunk?
    concatenate_gists_per_chunk=False,
)

Implementation: information_extraction/episodic_gist_extraction_openai.py When to use:

Long conversations (LongMemEval, LoCoMo)
Contextual question answering
When you need both exact quotes and summaries

Episodic gist provides the richest graph structure at the cost of more LLM calls (2x OpenIE).

Temporal

Purpose: Extract facts with temporal qualifiers for time-aware question answering. What it extracts:

Verbatim content: Original text
Temporal facts: Facts with time anchors in qualifiers
Entities: Named entities from facts

Example:

config = BaseConfig(
    extract_method="temporal",
    llm_name="gpt-4o-mini",
)

rag = ReMem(global_config=config)
rag.index(["On March 15, 2024, the team deployed version 2.0."])

Extracted structure:

{
  "verbatim": "On March 15, 2024, the team deployed version 2.0.",
  "facts": [
    {
      "subject": "team",
      "predicate": "deployed",
      "object": "version 2.0",
      "qualifiers": {
        "time": "2024-03-15"
      }
    }
  ]
}

Graph structure:

Verbatim ──→ Entity: "team" ──→ Entity: "version 2.0"
                ↓                    ↓
         Fact: (team, deployed, version 2.0) {time: 2024-03-15}

Implementation: information_extraction/temporal_extraction_openai.py When to use:

Time-series data
Event logs
“What happened before/after X?” questions
Temporal reasoning tasks

Temporal extraction uses specialized prompts that emphasize extracting time anchors.

Choosing an Extraction Method

Decision tree: Performance considerations:

Method	LLM Calls per Chunk	Extraction Speed	Graph Density
openie	1x	Fast	Low
episodic	1x	Fast	Medium
episodic_gist	2x	Moderate	High
temporal	1x	Fast	Medium

Extraction Prompts

Each method uses specialized prompts from prompts/templates/:

openie_extraction.txt: For standard triple extraction
episodic_fact_extraction_*.txt: For episodic fact extraction (dataset-specific)
episodic_gist_extraction_*.txt: For gist summarization
temporal_extraction.txt: For temporal fact extraction

Prompts are selected based on the dataset (episodic_gist_extraction_openai.py:75-84):

template_name = f"{template}_locomo"  # Default
selected = ["menatqa", "timeqa", "musique", "complex_tr", "2wikimultihopqa"]
if any(self.global_config.dataset.startswith(prefix) for prefix in selected):
    template_name = f"{template}_wikipedia"

Switching Extraction Methods

You can change extraction methods without rebuilding:

# Index with openie
config = BaseConfig(extract_method="openie")
rag = ReMem(global_config=config)
rag.index(docs)

# Re-index with episodic_gist (rebuilds from scratch)
config.extract_method = "episodic_gist"
config.force_index_from_scratch = True
rag = ReMem(global_config=config)
rag.index(docs)

Changing extraction methods requires rebuilding the graph with force_index_from_scratch=True.

Implementation Details

All extraction methods inherit from or follow a similar interface:

class ExtractionMethod:
    def batch_openie(self, chunks: Dict[str, ChunkInfo]) -> Dict[str, RawOutput]:
        """Extract structure from chunks in parallel."""
        pass

The extraction factory in remem.py (lines 130-172) selects the implementation:

if self.global_config.extract_method == "openie":
    self.openie = OpenIE(llm_model=self.extract_llm)
elif self.global_config.extract_method == "episodic":
    from remem.information_extraction.episodic_extraction_openai import EpisodicExtraction
    self.openie = EpisodicExtraction(self.extract_llm, self.global_config)
elif self.global_config.extract_method == "episodic_gist":
    from remem.information_extraction.episodic_gist_extraction_openai import EpisodicGistExtraction
    self.openie = EpisodicGistExtraction(self.extract_llm, self.global_config)
elif self.global_config.extract_method == "temporal":
    from remem.information_extraction.temporal_extraction_openai import TemporalExtraction
    self.openie = TemporalExtraction(self.extract_llm, self.global_config)

Next Steps

Understand the Memory Graph structure these methods create
Learn how Retrieval Strategies leverage extracted structure
Review Architecture for the full pipeline

Get Started

Core Concepts

Guides

Customization

Benchmarks

Extraction Methods

Overview

OpenIE

Episodic

Episodic Gist

Temporal

Choosing an Extraction Method

Extraction Prompts

Switching Extraction Methods

Implementation Details

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Customization

Benchmarks

​Overview

​OpenIE

​Episodic

​Episodic Gist

​Temporal

​Choosing an Extraction Method

​Extraction Prompts

​Switching Extraction Methods

​Implementation Details

​Next Steps

Build docs developers (and LLMs) love

Overview

OpenIE

Episodic

Episodic Gist

Temporal

Choosing an Extraction Method

Extraction Prompts

Switching Extraction Methods

Implementation Details

Next Steps