Overview
TheEpisodicExtraction class extracts episodic facts from conversational or sequential text. It specializes in identifying entities and relationships specific to episodic memory contexts, such as user interactions, conversations, and temporal sequences.
Class: EpisodicExtraction
Constructor
The language model instance used for episodic extraction
Optional global configuration containing:
dataset(str): Dataset name to select appropriate prompt templateseed(int): Random seed for reproducibilitytemperature(float): LLM temperature parameter
Methods
batch_openie()
Extract episodic facts from multiple chunks using multi-threaded processing.Parameters
Dictionary of chunks to process. Each key is a chunk ID, and each value contains:
metadata(dict): Episode metadata used to construct the passage- Other ChunkInfo fields (see OpenIE documentation)
Returns
A tuple containing two dictionaries:-
NER Results (
Dict[str, NerRawOutput]): Entities extracted from episodic contentchunk_id(str): The chunk identifierresponse(str): Raw LLM responseunique_entities(List[str]): Unique entities found (subjects and objects from triples)metadata(dict): Token usage and cache hit information
-
Triple Results (
Dict[str, TripleRawOutput]): Episodic fact tripleschunk_id(str): The chunk identifierresponse(str): Raw LLM responsetriples(List[Tuple]): List of (subject, predicate, object) episodic factsmetadata(dict): Token usage and cache hit information
Example Usage
Extraction Process
Template Selection
The extractor automatically selects the appropriate prompt template based on the dataset:- Default:
episodic_triple_extraction_longmemeval - Dataset-specific templates available for different memory evaluation benchmarks
- Templates are matched by dataset prefix in
global_config.dataset
JSON Mode
Extracts structured JSON output containing:triples: List of episodic relationship triples- Entities are derived from triple subjects and objects
Paraphrasing (Optional)
Whenparaphrasing=True in the episodic_extraction() method:
- Returns a third output:
ParaphraseRawOutput - Contains paraphrased versions of the episodic facts
- Useful for data augmentation and semantic variation
Specialized Features
Episodic Content Construction
Usesmake_chunk_content("episodic", metadata) to construct passages from chunk metadata:
- Formats episode information (speaker, timestamp, content)
- Structures conversational context appropriately
- Handles different episode formats based on metadata structure
Dataset-Specific Prompts
Supports specialized extraction for various memory benchmarks:- LongMemEval
- MenatQA
- TimeQA
- MuSiQue
- Complex Temporal Reasoning
- 2WikiMultiHopQA
Performance Considerations
- Single-stage extraction (combines entity and triple extraction)
- Parallel processing with
ThreadPoolExecutor - Progress tracking with token usage metrics
- Cache-aware processing for repeated chunks
- Entities derived from extracted triples (no separate NER step)
Error Handling
- Graceful handling of extraction failures
- Returns empty results with error metadata on exceptions
- Logs warnings for debugging
- Handles malformed JSON responses
- Replaces null values with empty strings