Skip to main content

Introduction to sift-kg

sift-kg is a zero-config CLI tool that turns any collection of documents into a browsable knowledge graph. Drop in PDFs, papers, articles, or records — get a graph that shows how everything connects, in minutes. No code, no database, no infrastructure — just a CLI and your documents.

What is sift-kg?

sift-kg extracts entities and relationships from your documents using LLMs, deduplicates them with your approval, and generates an interactive viewer you can explore in your browser. Every entity and relation links back to the source document and passage.

Zero-config start

Point at a folder, get a knowledge graph — or use sift.yaml for persistent settings

Any LLM provider

OpenAI, Anthropic, Mistral, Ollama (local/private), or any LiteLLM-compatible provider

Human-in-the-loop

sift proposes entity merges, you approve or reject in an interactive terminal UI

Interactive viewer

Explore your graph in-browser with community regions, focus mode, search, and filtering

Key Features

  • 75+ document formats — PDF, DOCX, XLSX, PPTX, HTML, EPUB, images, and more via Kreuzberg extraction engine
  • OCR for scanned PDFs — Local OCR via Tesseract (default), EasyOCR, or PaddleOCR, with optional Google Cloud Vision fallback
  • Schema-free by default — One LLM call samples your documents and designs a schema tailored to the corpus, saved as discovered_domain.yaml for reuse and editing
  • CLI searchsift search "SBF" finds entities by name or alias, with optional relation and description output
  • Export anywhere — GraphML (yEd, Cytoscape), GEXF (Gephi), SQLite, CSV, or native JSON for advanced analysis
  • Narrative generation — Prose reports with relationship chains, timelines, and community-grouped entity profiles
  • Source provenance — Every extraction links to the document and passage it came from
  • Multilingual — Extracts from documents in any language, outputs a unified English knowledge graph
  • Budget controls — Set --max-cost to cap LLM spending
  • Runs locally — Your documents stay on your machine

Use Cases

Research & Education

Map how theories, methods, and findings connect across a body of literature. Generate concept maps for courses, literature reviews, or self-study.

Business Intelligence

Drop in competitor whitepapers, market reports, or internal docs and see the landscape.

Investigative Work

Analyze FOIA releases, court filings, public records, and document leaks.

Legal Review

Extract and connect entities across document collections.

How It Works

Documents (PDF, DOCX, text, HTML, and 75+ formats)

  Text Extraction (Kreuzberg, local) — with optional OCR

  Schema Discovery (LLM designs entity/relation types from your data)

  Entity & Relation Extraction (LLM, using discovered or predefined schema)

  Knowledge Graph (NetworkX, JSON)

  Entity Resolution (LLM proposes → you review)

  Narrative Generation (LLM)

  Interactive Viewer (browser) / Export (GraphML, GEXF, CSV, SQLite)

Quick Example

pip install sift-kg

sift init                           # create sift.yaml + .env.example
sift extract ./documents/           # extract entities & relations
sift build                          # build knowledge graph
sift resolve                        # find duplicate entities
sift review                         # approve/reject merges interactively
sift apply-merges                   # apply your decisions
sift narrate                        # generate narrative summary
sift view                           # interactive graph in your browser
sift export graphml                 # export to Gephi, yEd, Cytoscape, SQLite, etc.

What’s Next?

Installation

Install sift-kg and configure your environment

Quick Start

Build your first knowledge graph in 5 minutes

Core Concepts

Learn how sift-kg processes documents

CLI Reference

Explore all available commands

Live Demos

Explore these knowledge graphs generated entirely by sift-kg:
  • Transformers — 12 foundational AI papers mapped as a concept graph (425 entities, ~$0.72)
  • FTX Collapse — The FTX collapse from 9 articles (431 entities)
  • Epstein — Giuffre v. Maxwell depositions (190 entities from a scanned PDF)
No install, no API key required — just explore.

Build docs developers (and LLMs) love