Introduction
While sift-kg is primarily a CLI tool, you can also use it as a Python library in your notebooks, web apps, or data pipelines. The Python API gives you full control over the extraction pipeline with explicit parameters instead of CLI arguments.Installation
Quick Start
Core Imports
Pipeline Architecture
The sift-kg pipeline consists of several stages:- Extract (
run_extract): Parse documents and extract entities/relations using an LLM - Build (
run_build): Construct a knowledge graph from extractions - Resolve (
run_resolve): Find duplicate entities using LLM-based comparison - Apply Merges (
run_apply_merges): Apply human-reviewed entity merges - Narrate (
run_narrate): Generate narrative summaries using community detection - View (
run_view): Create interactive HTML visualizations - Export (
run_export): Export to GraphML, GEXF, CSV, or SQLite
run_pipeline() or individual stages for more control.
Basic Example: Step-by-Step
Using Custom Domains
Working with the Knowledge Graph
Environment Setup
Set your LLM API keys before running:Next Steps
- Pipeline Functions - Detailed documentation for all pipeline functions
- KnowledgeGraph Class - Work with knowledge graphs programmatically
- Domain Configuration - Create custom domain schemas