Overview
This example demonstrates knowledge graph extraction from academic research papers, producing a graph of 425 entities and 1,122 relations across systems, concepts, researchers, methods, phenomena, and findings.View Interactive Graph
Open
examples/transformers/output/graph.html in your browserSource Papers
12 PDFs including “Attention Is All You Need”, BERT, GPT-2, GPT-3, ViT, DALL-E
Quick Start
Pipeline Output
Theoutput/ directory contains the complete pipeline results:
Pipeline Statistics
| Metric | Value |
|---|---|
| Input Documents | 12 PDFs |
| Source Papers | Attention Is All You Need, BERT, GPT-2, GPT-3, ViT, DALL-E, and more |
| Total Entities | 425 |
| Total Relations | 1,122 |
| Entity Types | 118 systems, 73 concepts, 71 researchers, 70 methods, 34 phenomena, 25 findings |
| Domain | academic (bundled with sift-kg) |
| Model Used | claude-haiku-4-5-20251001 |
| Total Cost | ~$0.72 |
Domain Configuration
This example uses the built-inacademic domain (defined in sift.yaml):
- System — Models, frameworks, architectures (e.g., “GPT-3”, “BERT”)
- Concept — Theoretical ideas and principles
- Researcher — Authors and cited researchers
- Method — Techniques and approaches
- Phenomenon — Observed patterns and behaviors
- Finding — Research results and conclusions
Key Insights from the Graph
The generated narrative (narrative.md) provides:
- Overview — Synthesis of how transformer architectures evolved
- Entity descriptions — AI-generated profiles for key systems, researchers, and concepts
- Community detection — Clustered subgraphs (e.g., vision transformers, language models, attention mechanisms)
Re-running the Example
Option 1: Build from Existing Extractions (Free)
Use the pre-extracted entities — no LLM API calls:Option 2: Full Pipeline from Scratch
Re-extract entities from the source PDFs:Option 3: Use Your Own Papers
Cost Breakdown
The ~$0.72 total cost includes:- Extraction — LLM calls to extract entities and relations from each PDF
- Resolution — LLM calls to identify duplicate entities
- Narration — LLM calls to generate entity descriptions and overview
- Document length and complexity
- Model chosen (Haiku vs GPT-4o-mini vs others)
- Number of resolution passes needed
Use Cases
This example pattern works well for:- Literature reviews — Map research landscape across papers
- Citation analysis — Track how researchers and concepts connect
- Technology evolution — See how systems and methods build on each other
- Knowledge synthesis — Generate summaries across multiple papers
Next Steps
Explore Other Examples
See FTX and Epstein examples
Domain Configuration
Learn how to customize entity types