sift build

Overview

Construct a knowledge graph from previously extracted entities and relations. Merges duplicate entities, normalizes entity types, removes redundancy, and flags low-confidence relations for review.

Usage

sift build [OPTIONS]

Options

--domain

string

Path to custom domain YAML file. Used to check review-required relation types and entity normalization.

--domain-name

string

default:"schema-free"

Bundled domain name (e.g., general, osint, academic). Use -d as shorthand.

--output

string

Output directory containing extraction results. Use -o as shorthand. Defaults to config value.

--review-threshold

float

default:"0.7"

Confidence threshold for flagging relations. Relations with confidence below this value are flagged for manual review.

--no-postprocess

boolean

default:"false"

Skip redundancy removal and graph cleanup. Use when you want raw extraction results without normalization.

--verbose

boolean

default:"false"

Enable verbose logging. Use -v as shorthand.

Behavior

Graph Building Process

Load Extractions - Reads all extraction JSON files from {output_dir}/extractions/
Entity Normalization - Applies domain-specific canonical name mappings
Relation Validation - Checks relation types against domain schema (if applicable)
Redundancy Removal - Removes duplicate relations and normalizes entity references (unless --no-postprocess)
Confidence Flagging - Flags low-confidence relations and review-required types

Schema-Free Mode

If using schema-free domain, loads discovered schema from discovered_domain.yaml (created during extraction) for normalization.

Output Files

graph_data.json

The built knowledge graph in JSON format containing:

All entities with attributes and metadata
All relations with confidence scores
Source document references

Saved to: {output_dir}/graph_data.json

relation_review.yaml

Generated if relations are flagged for review. Contains:

Relations below confidence threshold
Relations with types marked review_required in domain config
Fields: source_id, target_id, relation_type, confidence, status

Saved to: {output_dir}/relation_review.yaml

Examples

Basic build

sift build

Builds graph from extractions in default output directory with standard post-processing.

Custom review threshold

sift build --review-threshold 0.8

Flags relations with confidence below 0.8 for review.

Skip post-processing

sift build --no-postprocess

Builds raw graph without redundancy removal.

With custom domain

sift build --domain ./my-domain.yaml --output ./results

Builds graph using custom domain config and specified output directory.

Output Summary

Displays:

Total entities in graph
Total relations in graph
Number of relations flagged for review
Output file location

Next Steps

After building the graph:

sift resolve

To find and merge duplicate entities using LLM-based resolution.

Error Handling

The command exits with error if:

No extraction files found in {output_dir}/extractions/
Output directory doesn’t exist

CLI Commands

Python API

Overview

Usage

Options

Behavior

Graph Building Process

Schema-Free Mode

Output Files

graph_data.json

relation_review.yaml

Examples

Basic build

Custom review threshold

Skip post-processing

With custom domain

Output Summary

Next Steps

Error Handling

See Also

Build docs developers (and LLMs) love

CLI Commands

Python API

​Overview

​Usage

​Options

​Behavior

​Graph Building Process

​Schema-Free Mode

​Output Files

​graph_data.json

​relation_review.yaml

​Examples

​Basic build

​Custom review threshold

​Skip post-processing

​With custom domain

​Output Summary

​Next Steps

​Error Handling

​See Also

Build docs developers (and LLMs) love

Overview

Usage

Options

Behavior

Graph Building Process

Schema-Free Mode

Output Files

graph_data.json

relation_review.yaml

Examples

Basic build

Custom review threshold

Skip post-processing

With custom domain

Output Summary

Next Steps

Error Handling

See Also