Introduction to sift-kg
sift-kg is a zero-config CLI tool that turns any collection of documents into a browsable knowledge graph. Drop in PDFs, papers, articles, or records — get a graph that shows how everything connects, in minutes. No code, no database, no infrastructure — just a CLI and your documents.What is sift-kg?
sift-kg extracts entities and relationships from your documents using LLMs, deduplicates them with your approval, and generates an interactive viewer you can explore in your browser. Every entity and relation links back to the source document and passage.Zero-config start
Point at a folder, get a knowledge graph — or use
sift.yaml for persistent settingsAny LLM provider
OpenAI, Anthropic, Mistral, Ollama (local/private), or any LiteLLM-compatible provider
Human-in-the-loop
sift proposes entity merges, you approve or reject in an interactive terminal UI
Interactive viewer
Explore your graph in-browser with community regions, focus mode, search, and filtering
Key Features
- 75+ document formats — PDF, DOCX, XLSX, PPTX, HTML, EPUB, images, and more via Kreuzberg extraction engine
- OCR for scanned PDFs — Local OCR via Tesseract (default), EasyOCR, or PaddleOCR, with optional Google Cloud Vision fallback
- Schema-free by default — One LLM call samples your documents and designs a schema tailored to the corpus, saved as
discovered_domain.yamlfor reuse and editing - CLI search —
sift search "SBF"finds entities by name or alias, with optional relation and description output - Export anywhere — GraphML (yEd, Cytoscape), GEXF (Gephi), SQLite, CSV, or native JSON for advanced analysis
- Narrative generation — Prose reports with relationship chains, timelines, and community-grouped entity profiles
- Source provenance — Every extraction links to the document and passage it came from
- Multilingual — Extracts from documents in any language, outputs a unified English knowledge graph
- Budget controls — Set
--max-costto cap LLM spending - Runs locally — Your documents stay on your machine
Use Cases
Research & Education
Map how theories, methods, and findings connect across a body of literature. Generate concept maps for courses, literature reviews, or self-study.
Business Intelligence
Drop in competitor whitepapers, market reports, or internal docs and see the landscape.
Investigative Work
Analyze FOIA releases, court filings, public records, and document leaks.
Legal Review
Extract and connect entities across document collections.
How It Works
Quick Example
What’s Next?
Installation
Install sift-kg and configure your environment
Quick Start
Build your first knowledge graph in 5 minutes
Core Concepts
Learn how sift-kg processes documents
CLI Reference
Explore all available commands
Live Demos
Explore these knowledge graphs generated entirely by sift-kg:- Transformers — 12 foundational AI papers mapped as a concept graph (425 entities, ~$0.72)
- FTX Collapse — The FTX collapse from 9 articles (431 entities)
- Epstein — Giuffre v. Maxwell depositions (190 entities from a scanned PDF)