Skip to main content

Overview

The TemporalExtraction class extracts facts with explicit temporal information, capturing when events occurred, their duration, and temporal relationships between entities. This enables time-aware knowledge representation.

Class: TemporalExtraction

Constructor

from remem.information_extraction.temporal_extraction_openai import TemporalExtraction
from remem.llm.openai_gpt import CacheOpenAI

llm_model = CacheOpenAI(model="gpt-4")
extractor = TemporalExtraction(llm_model=llm_model, global_config=config)
llm_model
CacheOpenAI
required
The language model instance used for temporal extraction
global_config
object
default:"None"
Optional global configuration containing:
  • seed (int): Random seed for reproducibility
  • temperature (float): LLM temperature parameter

Methods

batch_openie()

Extract temporal facts from multiple chunks using multi-threaded processing.
def batch_openie(
    self,
    chunks: Dict[str, ChunkInfo]
) -> Dict[str, TemporalRawOutput]

Parameters

chunks
Dict[str, ChunkInfo]
required
Dictionary of chunks to process. Each key is a chunk ID, and each value contains:
  • metadata (dict): Temporal metadata for constructing the passage
  • Other ChunkInfo fields (see OpenIE documentation)

Returns

A dictionary of TemporalRawOutput objects:
TemporalRawOutput
object
Temporal extraction results containing:
  • chunk_id (str): The chunk identifier
  • verbatim (str): Original passage text
  • facts (List[dict]): List of temporal facts, each containing:
    • subject (str): The entity the fact is about
    • predicate (str): The relationship or action
    • object (str): The target entity or value
    • temporal_qualifier (str): Time information (date, duration, or temporal relation)
  • response (str): Raw LLM response
  • metadata (dict): Token usage and processing information

Example Usage

from remem.information_extraction.temporal_extraction_openai import TemporalExtraction
from remem.llm.openai_gpt import CacheOpenAI

# Initialize the extractor
llm_model = CacheOpenAI(model="gpt-4")
config = type('Config', (), {
    'seed': 42,
    'temperature': 0.0
})()

extractor = TemporalExtraction(llm_model=llm_model, global_config=config)

# Prepare temporal chunks
chunks = {
    "event_1": {
        "metadata": {
            "timestamp": "2024-01-15",
            "content": "Marie Curie won the Nobel Prize in Physics in 1903 and the Nobel Prize in Chemistry in 1911. She worked at the University of Paris from 1906 to 1934."
        },
        "num_tokens": 70,
        "content": "Marie Curie won the Nobel Prize...",
        "chunk_order": [(0, 1)],
        "full_doc_ids": ["doc_789"]
    }
}

# Extract temporal facts
results = extractor.batch_openie(chunks)

# Access results
for chunk_id, output in results.items():
    print(f"\nTemporal facts in {chunk_id}:")
    for fact in output.facts:
        print(f"  Subject: {fact['subject']}")
        print(f"  Predicate: {fact['predicate']}")
        print(f"  Object: {fact['object']}")
        print(f"  Temporal: {fact['temporal_qualifier']}")
        print()

# Output:
#   Subject: Marie Curie
#   Predicate: won
#   Object: Nobel Prize in Physics
#   Temporal: 1903
#
#   Subject: Marie Curie
#   Predicate: won
#   Object: Nobel Prize in Chemistry
#   Temporal: 1911
#
#   Subject: Marie Curie
#   Predicate: worked at
#   Object: University of Paris
#   Temporal: from 1906 to 1934

Extraction Process

Temporal Fact Structure

Each extracted fact contains four components:
  1. Subject: The entity the fact is about
  2. Predicate: The action, relationship, or state
  3. Object: The target entity, value, or description
  4. Temporal Qualifier: Time information in various formats:
    • Specific dates: “2024-01-15”, “January 2024”
    • Date ranges: “from 2020 to 2023”, “2020-2023”
    • Relative times: “last week”, “two years ago”
    • Durations: “for 3 months”, “since 2020”
    • Temporal relations: “before graduation”, “after the meeting”

Prompt Template

Uses the temporal_extraction prompt template which instructs the LLM to:
  • Identify temporal expressions in the text
  • Associate time information with facts
  • Normalize temporal references when possible
  • Handle various temporal formats and granularities

Passage Construction

Uses make_chunk_content("temporal", metadata) to format input:
  • Structures temporal context appropriately
  • Preserves timestamp information from metadata
  • Formats content for temporal extraction

Temporal Qualifier Types

Absolute Time

  • Specific dates: “2024-03-15”
  • Specific times: “2024-03-15 14:30:00”
  • Year-month: “March 2024”
  • Year only: “2024”

Relative Time

  • Past references: “yesterday”, “last week”, “2 days ago”
  • Future references: “tomorrow”, “next month”, “in 3 weeks”
  • Current references: “today”, “this week”, “currently”

Duration

  • Explicit duration: “for 5 years”, “during 3 months”
  • Open-ended: “since 2020”, “until now”
  • Bounded: “from 2020 to 2023”, “between Jan and Mar”

Temporal Relations

  • Sequential: “before X”, “after Y”, “during Z”
  • Simultaneous: “while X”, “at the same time as Y”
  • Frequency: “every Monday”, “twice a year”

Performance Considerations

  • Parallel processing: Default max_workers=8 for ThreadPoolExecutor
  • Single-stage extraction: Extracts complete temporal facts in one pass
  • JSON mode: Structured output for reliable parsing
  • Progress tracking: Real-time token usage and cache hit metrics
  • Efficient threading: Balances concurrency with API rate limits

Error Handling

  • Automatic JSON repair for truncated responses
  • Graceful handling of malformed temporal expressions
  • Returns empty fact lists on extraction failure
  • Preserves error information in metadata
  • Logs warnings for debugging
  • Replaces null values with empty strings

Use Cases

Time-Aware Knowledge Graphs

# Build temporal knowledge representation
for fact in temporal_facts:
    kg.add_edge(
        fact['subject'],
        fact['object'],
        relation=fact['predicate'],
        timestamp=fact['temporal_qualifier']
    )

Event Timeline Construction

# Create chronological event sequences
events = sorted(
    temporal_facts,
    key=lambda f: parse_date(f['temporal_qualifier'])
)

Temporal Question Answering

# Query: "When did Marie Curie win the Nobel Prize?"
matching_facts = [
    f for f in temporal_facts
    if f['subject'] == 'Marie Curie'
    and 'Nobel Prize' in f['object']
]
print(matching_facts[0]['temporal_qualifier'])  # "1903"

Historical Analysis

# Extract facts within a time range
recent_facts = [
    f for f in temporal_facts
    if is_within_range(f['temporal_qualifier'], start='2020-01-01', end='2024-12-31')
]

Integration with Memory Systems

Temporal extraction enables:
  • Episodic memory: Link events to specific times
  • Semantic memory: Track knowledge validity periods
  • Autobiographical memory: Create personal timelines
  • Context-aware retrieval: Filter by temporal constraints
  • Memory decay modeling: Weight facts by recency

Build docs developers (and LLMs) love