Skip to main content

Overview

Identify duplicate entities in the knowledge graph using LLM-powered entity resolution. Generates merge proposals for likely duplicates and flags variant relationships (e.g., EXTENDS) for review.

Usage

sift resolve [OPTIONS]

Options

--model
string
LLM model to use for entity resolution (e.g., openai/gpt-4o-mini). Overrides default from config.
--domain
string
Path to custom domain YAML file. Used for system context in resolution prompts.
--domain-name
string
default:"schema-free"
Bundled domain name (e.g., general, osint). Use -d as shorthand.
--concurrency
integer
default:"4"
Number of concurrent LLM calls. Use -c as shorthand. Higher values speed up processing.
--rpm
integer
default:"40"
Maximum requests per minute to prevent rate limiting.
--embeddings
boolean
default:"false"
Use semantic clustering with embeddings for candidate selection. Requires: pip install sift-kg[embeddings]
--output
string
Output directory containing graph data. Use -o as shorthand.
--verbose
boolean
default:"false"
Enable verbose logging. Use -v as shorthand.

Behavior

Resolution Process

  1. Load Graph - Reads graph_data.json from output directory
  2. Candidate Selection - Groups similar entities by name/type
    • Optional: Uses embeddings for semantic clustering if --embeddings enabled
  3. LLM Resolution - Asks LLM to judge if candidate pairs are duplicates
  4. Merge Proposals - Generates proposals with confidence scores
  5. Variant Detection - Identifies variant relationships (e.g., class inheritance)

Embedding-Based Resolution

When --embeddings is enabled:
  • Computes semantic embeddings for entity names and attributes
  • Uses clustering to find similar entities beyond exact name matches
  • More accurate but requires additional dependencies

Output Files

merge_proposals.yaml

Contains entity merge proposals with three sections:
  • draft - Pending review
  • confirmed - Approved for merging
  • rejected - Declined merges
Each proposal includes:
  • cluster_id - Unique identifier
  • members - List of entity IDs to merge
  • canonical_name - Suggested merged entity name
  • reasoning - LLM explanation
  • confidence - Score from LLM
Saved to: {output_dir}/merge_proposals.yaml

relation_review.yaml (updated)

Variant relationships detected during resolution are appended to this file. These are relations like EXTENDS or IS_A that indicate entity variants rather than duplicates.

Examples

Basic resolution

sift resolve
Finds duplicates using default settings.

With embeddings for better accuracy

sift resolve --embeddings
Uses semantic clustering to find similar entities.

High-performance resolution

sift resolve -c 8 --rpm 60 --model openai/gpt-4o-mini
Uses 8 concurrent workers with higher rate limit.

With custom domain context

sift resolve --domain ./legal-domain.yaml
Uses domain-specific context for resolution decisions.

Output Summary

Displays:
  • Number of merge proposals generated
  • Number of variant relationships found
  • Total cost in USD
  • Output file location

Next Steps

After resolution:
sift review
To interactively review and approve/reject merge proposals, then:
sift apply-merges
To apply approved merges to the graph.

Error Handling

Exits with error if:
  • No graph_data.json found (run sift build first)
  • API key validation fails
  • Embeddings not installed when --embeddings used

See Also

Build docs developers (and LLMs) love