Overview
TheOpenIE class performs open information extraction by identifying named entities and extracting semantic triples (subject-predicate-object relationships) from text passages.
Class: OpenIE
Constructor
The language model instance used for entity and triple extraction
Methods
batch_openie()
Conduct batch OpenIE synchronously using multi-threading for both NER and triple extraction.Parameters
Dictionary of chunks to process. Each key is a chunk ID (hashed chunk), and each value contains:
num_tokens(int): Number of tokens in the chunkcontent(str): The text content to extract fromchunk_order(List[Tuple]): Ordering informationfull_doc_ids(List[str]): Associated document IDs
Returns
A tuple containing two dictionaries:-
NER Results (
Dict[str, NerRawOutput]): Named entity recognition resultschunk_id(str): The chunk identifierresponse(str): Raw LLM responseunique_entities(List[str]): List of unique entities foundmetadata(dict): Token usage and cache hit information
-
Triple Results (
Dict[str, TripleRawOutput]): Extracted knowledge graph tripleschunk_id(str): The chunk identifierresponse(str): Raw LLM responsetriples(List[Tuple]): List of (subject, predicate, object) triplesmetadata(dict): Token usage and cache hit information
Example Usage
Extraction Process
The OpenIE extraction follows a two-stage pipeline:-
Named Entity Recognition (NER)
- Uses the
nerprompt template - Extracts unique named entities from each chunk
- Deduplicates entities while preserving order
- Returns entities as a list of strings
- Uses the
-
Triple Extraction
- Uses the
triple_extractionprompt template - Takes extracted entities as input
- Identifies relationships between entities
- Returns structured (subject, predicate, object) triples
- Filters invalid triples automatically
- Uses the
Performance Considerations
- Uses
ThreadPoolExecutorfor parallel processing of multiple chunks - Displays progress bars for both NER and triple extraction phases
- Tracks token usage and cache hits via metadata
- Handles malformed JSON responses with automatic fixing
- Continues processing even if individual chunks fail
Error Handling
The extractor includes robust error handling:- Catches and logs exceptions during NER or triple extraction
- Returns empty results with error metadata when extraction fails
- Stores error messages in the
metadatafield of output objects - Attempts to fix broken JSON responses due to length limits