Overview
This guide walks you through creating a knowledge graph from a collection of documents. You’ll extract entities and relationships, deduplicate them, and explore the results in an interactive browser viewer.Before you begin, make sure you’ve installed sift-kg and configured your API key.
Your First Knowledge Graph
Let’s build a knowledge graph from a folder of documents. For this example, we’ll assume you have some PDFs or text files in a./documents/ folder.
Initialize your project
Create configuration files in your current directory:This creates:Edit
.env.example— Template for your API keyssift.yaml— Project settings
.env.example to .env and add your API key:.env and set your API key:.env
Extract entities and relations
Point sift-kg at your documents folder:This will:Example output:
- Read all supported files (PDF, DOCX, XLSX, HTML, images, 75+ formats)
- Chunk the text into manageable pieces
- Use your LLM to extract entities and relationships
- Save results to
output/extractions/
Schema-free mode (the default) runs a schema discovery step — the LLM samples your documents and designs entity/relation types tailored to your corpus. The discovered schema is saved to
output/discovered_domain.yaml and reused on subsequent runs.Build the knowledge graph
Construct a NetworkX graph from all extractions:This will:
- Load all extraction results
- Automatically deduplicate near-identical names (plurals, Unicode variants, case differences)
- Fix reversed edge directions when the LLM swaps source/target types
- Flag low-confidence relations for review
- Save the graph to
output/graph_data.json
Find and resolve duplicates
Use the LLM to find entities that likely refer to the same real-world thing:This creates
output/merge_proposals.yaml with proposed entity merges.Example output:Review and approve merges
Review the proposed merges interactively:This walks through each proposal, showing:
- The canonical entity
- The proposed merge members
- The LLM’s confidence and reasoning
- Approve — Mark as
CONFIRMED - Reject — Mark as
REJECTED - Skip — Leave as
DRAFTto review later
Generate a narrative summary
Create a prose report with entity profiles and relationship chains:This produces
output/narrative.md with:- An overview of the graph
- Key relationship chains between top entities
- A timeline (when dates exist in the data)
- Entity profiles grouped by thematic community
Explore in your browser
Open an interactive graph viewer:This opens Focus Mode Navigation:
output/graph.html in your browser with:- Community regions — Colored zones grouping related entities
- Hover preview — See entity names and connections
- Focus mode — Double-click to isolate neighborhoods
- Search — Find entities by name
- Filters — Toggle by type, community, source document, confidence
- Trail breadcrumb — Track your exploration path
- Double-click any entity to enter focus mode
- Arrow keys to step through connections
- Enter/Right to shift focus to a neighbor
- Backspace/Left to go back along your path
- Escape to exit focus mode
Search Entities from the CLI
You can search your knowledge graph directly from the terminal:sift narrate first):
Export Your Graph
Export to various formats for use in other tools:Complete Pipeline Example
Here’s the full workflow in one go:Real-World Examples
Explore these knowledge graphs generated entirely by sift-kg:Transformers Papers
12 foundational AI papers mapped as a concept graph
- 425 entities
- Cost: ~$0.72
- Domain:
academic
FTX Collapse
The FTX cryptocurrency exchange collapse from 9 articles
- 431 entities
- Domain:
osint
Epstein Depositions
Giuffre v. Maxwell depositions extracted from a scanned PDF
- 190 entities
- Used OCR for scanned documents
Advanced Configuration
You can configure defaults insift.yaml so you don’t need flags on every command:
sift.yaml
.env > sift.yaml > defaults
You can override anything from sift.yaml with a flag:
Using Bundled Domains
sift-kg ships with specialized domains:schema-free (default)
Auto-discovers entity and relation types from your data
general
PERSON, ORGANIZATION, LOCATION, EVENT, DOCUMENT
osint
Investigations: SHELL_COMPANY, FINANCIAL_ACCOUNT, beneficial ownership
academic
Literature review: CONCEPT, THEORY, METHOD, FINDING, PUBLICATION
sift.yaml:
sift.yaml
Next Steps
Core Concepts
Understand how sift-kg processes documents
CLI Reference
Explore all available commands
Domains
Learn about schema-free vs. structured domains
Entity Resolution
Deep dive into the deduplication workflow