Overview
This example demonstrates knowledge graph extraction from journalistic and encyclopedic content, with emphasis on the entity resolution workflow to merge duplicate entities across multiple documents.View Interactive Graph
Open
examples/ftx/output/graph.html in your browserSource Documents
9 text files covering FTX, Alameda Research, Binance, and key people
Quick Start
Pipeline Output
Theoutput/ directory contains the complete pipeline results:
Pipeline Statistics
| Metric | Value |
|---|---|
| Input Documents | 9 text files (~148K total) |
| Topics Covered | FTX, Alameda Research, Binance, Sam Bankman-Fried, and other key figures |
| Raw Entities Extracted | ~777 entities from LLM |
| After Pre-dedup (semhash) | 750 entities (27 deterministic merges) |
| After Build + Postprocess | 432 entities, 1,201 relations |
| After Resolution (3 passes) | 373 entities, 1,184 relations (59 entities merged via LLM + human review) |
| Final Entity Descriptions | 100 entity profiles in narrative |
| Model Used | claude-haiku-4-5-20251001 |
| Total Cost | ~$0.28 (extraction was separate) |
Entity Resolution Workflow
This example showcases the full deduplication pipeline:1. Automatic Semantic Deduplication
During extraction, semantic hashing automatically merges near-identical entities:- Before: 777 raw entities
- After: 750 entities (27 deterministic merges)
2. Build + Postprocess
Graph construction with normalization and filtering:- Result: 432 entities, 1,201 relations
3. LLM-Assisted Resolution
Three passes ofsift resolve to identify remaining duplicates:
merge_proposals.yaml with candidate merges:
4. Human Review
5. Apply Merges
Key Insights from the Graph
The generated narrative (narrative.md) provides:
- Overview — High-level synthesis of the FTX collapse timeline
- Entity descriptions — AI-generated profiles for 100 key entities (companies, people, events)
- Relationship mapping — How FTX, Alameda, Binance, and key figures are connected
Re-running the Example
Option 1: Build from Existing Extractions (Free)
Use the pre-extracted entities — no LLM API calls:Option 2: Full Pipeline from Scratch
Re-extract entities from the source documents:Cost Breakdown
The ~$0.28 total cost includes:- Resolution — LLM calls across 3 passes to identify duplicates
- Narration — LLM calls to generate entity descriptions and overview
- Number of entities to resolve
- Model chosen (Haiku vs GPT-4o-mini vs others)
- Number of resolution passes
Use Cases
This example pattern works well for:- Investigative journalism — Map connections across news articles
- Business intelligence — Track companies, people, and events across sources
- Historical analysis — Document timelines and relationships in major events
- Due diligence — Aggregate information about entities from multiple sources
Merge Proposals File
Themerge_proposals.yaml file is a key artifact:
- Is generated by
sift resolve - Can be manually edited before applying
- Supports version control and collaboration
- Is applied with
sift apply-merges
Next Steps
Explore Other Examples
See Transformers and Epstein examples
Resolution Guide
Learn more about entity deduplication