Requirements
sift-kg requires Python 3.11 or higher. You can verify your Python version:Install sift-kg
Install via pip
Install the base package:This installs the core sift-kg CLI with support for 75+ document formats (PDF, DOCX, XLSX, PPTX, HTML, EPUB, images, and more).
Optional Dependencies
sift-kg has several optional features you can enable depending on your needs.OCR Support (Scanned PDFs)
If you need to process scanned PDFs or images, install Tesseract OCR on your system:--ocr flag:
Google Cloud Vision OCR (Optional)
For Google Cloud Vision as an alternative OCR backend:Google Cloud Vision requires setting up GCP credentials. For most users, local Tesseract OCR (included by default) is sufficient.
Semantic Clustering (Optional)
For improved entity resolution using semantic embeddings (~2GB download for PyTorch):Install All Optional Dependencies
To install everything at once:- Google Cloud Vision OCR support
- Semantic clustering with sentence-transformers
LLM Provider Setup
sift-kg works with any LLM provider supported by LiteLLM. The most common options are:Get an API key
Choose your LLM provider and get an API key:
- OpenAI — Get your key at platform.openai.com/api-keys
- Anthropic — Get your key at console.anthropic.com/settings/keys
- Mistral — Get your key at console.mistral.ai
- Ollama — Run models locally (no API key needed)
Initialize your project
Run This creates two files:
sift init to create configuration files:.env.example— Template for API keyssift.yaml— Project configuration
Development Installation
If you want to contribute to sift-kg or modify the source code:Verify Your Setup
Check your installation and configuration:- Domain settings
- Default model
- Output directory
- Processing stats (if you’ve run the pipeline)
Next Steps
Quick Start
Build your first knowledge graph in 5 minutes
CLI Reference
Explore all available commands