OpenIE

Overview

The OpenIE class performs open information extraction by identifying named entities and extracting semantic triples (subject-predicate-object relationships) from text passages.

Class: OpenIE

Constructor

from remem.information_extraction.openie_openai import OpenIE
from remem.llm.openai_gpt import CacheOpenAI

llm_model = CacheOpenAI(model="gpt-4")
extractor = OpenIE(llm_model=llm_model)

llm_model

CacheOpenAI

required

The language model instance used for entity and triple extraction

Methods

batch_openie()

Conduct batch OpenIE synchronously using multi-threading for both NER and triple extraction.

def batch_openie(
    self,
    chunks: Dict[str, ChunkInfo]
) -> Tuple[Dict[str, NerRawOutput], Dict[str, TripleRawOutput]]

Parameters

chunks

Dict[str, ChunkInfo]

required

Dictionary of chunks to process. Each key is a chunk ID (hashed chunk), and each value contains:

num_tokens (int): Number of tokens in the chunk
content (str): The text content to extract from
chunk_order (List[Tuple]): Ordering information
full_doc_ids (List[str]): Associated document IDs

Returns

A tuple containing two dictionaries:

NER Results (Dict[str, NerRawOutput]): Named entity recognition results
- chunk_id (str): The chunk identifier
- response (str): Raw LLM response
- unique_entities (List[str]): List of unique entities found
- metadata (dict): Token usage and cache hit information
Triple Results (Dict[str, TripleRawOutput]): Extracted knowledge graph triples
- chunk_id (str): The chunk identifier
- response (str): Raw LLM response
- triples (List[Tuple]): List of (subject, predicate, object) triples
- metadata (dict): Token usage and cache hit information

Example Usage

from remem.information_extraction.openie_openai import OpenIE, ChunkInfo
from remem.llm.openai_gpt import CacheOpenAI

# Initialize the extractor
llm_model = CacheOpenAI(model="gpt-4")
extractor = OpenIE(llm_model=llm_model)

# Prepare chunks
chunks = {
    "chunk_1": {
        "num_tokens": 150,
        "content": "Albert Einstein was born in Ulm, Germany. He developed the theory of relativity.",
        "chunk_order": [(0, 1)],
        "full_doc_ids": ["doc_123"]
    }
}

# Extract entities and triples
ner_results, triple_results = extractor.batch_openie(chunks)

# Access results
for chunk_id, ner in ner_results.items():
    print(f"Entities in {chunk_id}: {ner.unique_entities}")
    # Output: ["Albert Einstein", "Ulm", "Germany", "theory of relativity"]

for chunk_id, triples in triple_results.items():
    print(f"Triples in {chunk_id}:")
    for triple in triples.triples:
        print(f"  {triple}")
    # Output:
    #   ("Albert Einstein", "was born in", "Ulm")
    #   ("Albert Einstein", "was born in", "Germany")
    #   ("Albert Einstein", "developed", "theory of relativity")

Extraction Process

The OpenIE extraction follows a two-stage pipeline:

Named Entity Recognition (NER)
- Uses the ner prompt template
- Extracts unique named entities from each chunk
- Deduplicates entities while preserving order
- Returns entities as a list of strings
Triple Extraction
- Uses the triple_extraction prompt template
- Takes extracted entities as input
- Identifies relationships between entities
- Returns structured (subject, predicate, object) triples
- Filters invalid triples automatically

Performance Considerations

Uses ThreadPoolExecutor for parallel processing of multiple chunks
Displays progress bars for both NER and triple extraction phases
Tracks token usage and cache hits via metadata
Handles malformed JSON responses with automatic fixing
Continues processing even if individual chunks fail

Error Handling

The extractor includes robust error handling:

Catches and logs exceptions during NER or triple extraction
Returns empty results with error metadata when extraction fails
Stores error messages in the metadata field of output objects
Attempts to fix broken JSON responses due to length limits

Core API

Information Extraction

RAG Strategies

Embeddings

LLM Backends

Evaluation

Overview

Class: OpenIE

Constructor

Methods

batch_openie()

Parameters

Returns

Example Usage

Extraction Process

Performance Considerations

Error Handling

Build docs developers (and LLMs) love

Core API

Information Extraction

RAG Strategies

Embeddings

LLM Backends

Evaluation

​Overview

​Class: OpenIE

​Constructor

​Methods

​batch_openie()

​Parameters

​Returns

​Example Usage

​Extraction Process

​Performance Considerations

​Error Handling

Build docs developers (and LLMs) love

Overview

Class: OpenIE

Constructor

Methods

batch_openie()

Parameters

Returns

Example Usage

Extraction Process

Performance Considerations

Error Handling