Installation

Requirements

sift-kg requires Python 3.11 or higher. You can verify your Python version:

python --version

If you need to upgrade Python, visit python.org/downloads.

Install sift-kg

Install via pip

Install the base package:

pip install sift-kg

This installs the core sift-kg CLI with support for 75+ document formats (PDF, DOCX, XLSX, PPTX, HTML, EPUB, images, and more).

Verify installation

Check that sift-kg is installed correctly:

sift --help

You should see the available commands: extract, build, resolve, review, apply-merges, narrate, view, and more.

Optional Dependencies

sift-kg has several optional features you can enable depending on your needs.

OCR Support (Scanned PDFs)

If you need to process scanned PDFs or images, install Tesseract OCR on your system:

brew install tesseract

Once installed, enable OCR with the --ocr flag:

sift extract ./documents/ --ocr

sift-kg autodetects which PDFs need OCR — text-rich PDFs use standard extraction, only near-empty pages fall back to OCR.

By default, sift-kg uses Tesseract (local, no API keys needed). You can switch OCR engines with --ocr-backend:

tesseract — Default, local
easyocr — Local, more accurate but slower
paddleocr — Local, fast for Asian languages
gcv — Google Cloud Vision (requires credentials and sift-kg[ocr] extras)

Google Cloud Vision OCR (Optional)

For Google Cloud Vision as an alternative OCR backend:

pip install sift-kg[ocr]

Then use:

sift extract ./documents/ --ocr --ocr-backend gcv

Google Cloud Vision requires setting up GCP credentials. For most users, local Tesseract OCR (included by default) is sufficient.

Semantic Clustering (Optional)

For improved entity resolution using semantic embeddings (~2GB download for PyTorch):

pip install sift-kg[embeddings]

This enables semantic clustering during entity resolution, which groups similar entities together even if they have different spellings (e.g., “Robert Smith” and “Bob Smith”). Use it with:

sift resolve --embeddings

The embeddings feature is most useful for large graphs (1000+ entities) or when dealing with many name variations. For smaller graphs, the default alphabetical batching works well.

Install All Optional Dependencies

To install everything at once:

pip install sift-kg[all]

This includes:

Google Cloud Vision OCR support
Semantic clustering with sentence-transformers

LLM Provider Setup

sift-kg works with any LLM provider supported by LiteLLM. The most common options are:

Get an API key

Choose your LLM provider and get an API key:

OpenAI — Get your key at platform.openai.com/api-keys
Anthropic — Get your key at console.anthropic.com/settings/keys
Mistral — Get your key at console.mistral.ai
Ollama — Run models locally (no API key needed)

For local/private deployment, use Ollama to run models on your own machine — no API keys or cloud services required.

Initialize your project

Run sift init to create configuration files:

sift init

This creates two files:

.env.example — Template for API keys
sift.yaml — Project configuration

Configure your API key

Copy .env.example to .env and add your API key:

cp .env.example .env

Edit .env and add your key:

.env

# Choose your provider:
SIFT_OPENAI_API_KEY=sk-...
# SIFT_ANTHROPIC_API_KEY=sk-ant-...
# SIFT_MISTRAL_API_KEY=...

# Set your default model
SIFT_DEFAULT_MODEL=openai/gpt-4o-mini

SIFT_OPENAI_API_KEY=sk-proj-...
SIFT_DEFAULT_MODEL=openai/gpt-4o-mini

Never commit your .env file to version control. The .env.example file is safe to commit — it contains no secrets.

Development Installation

If you want to contribute to sift-kg or modify the source code:

git clone https://github.com/juanceresa/sift-kg.git
cd sift-kg
pip install -e ".[dev]"

This installs sift-kg in editable mode with development dependencies (pytest, ruff).

Verify Your Setup

Check your installation and configuration:

sift info

This displays your current configuration, including:

Domain settings
Default model
Output directory
Processing stats (if you’ve run the pipeline)

Next Steps

Quick Start

Build your first knowledge graph in 5 minutes

CLI Reference

Explore all available commands

Get Started

Core Concepts

Guides

Examples

Requirements

Install sift-kg

Optional Dependencies

OCR Support (Scanned PDFs)

Google Cloud Vision OCR (Optional)

Semantic Clustering (Optional)

Install All Optional Dependencies

LLM Provider Setup

Development Installation

Verify Your Setup

Next Steps

Quick Start

CLI Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Examples

​Requirements

​Install sift-kg

​Optional Dependencies

​OCR Support (Scanned PDFs)

​Google Cloud Vision OCR (Optional)

​Semantic Clustering (Optional)

​Install All Optional Dependencies

​LLM Provider Setup

​Development Installation

​Verify Your Setup

​Next Steps

Quick Start

CLI Reference

Build docs developers (and LLMs) love

Requirements

Install sift-kg

Optional Dependencies

OCR Support (Scanned PDFs)

Google Cloud Vision OCR (Optional)

Semantic Clustering (Optional)

Install All Optional Dependencies

LLM Provider Setup

Development Installation

Verify Your Setup

Next Steps