Overview
Identify duplicate entities in the knowledge graph using LLM-powered entity resolution. Generates merge proposals for likely duplicates and flags variant relationships (e.g., EXTENDS) for review.Usage
Options
LLM model to use for entity resolution (e.g.,
openai/gpt-4o-mini). Overrides default from config.Path to custom domain YAML file. Used for system context in resolution prompts.
Bundled domain name (e.g.,
general, osint). Use -d as shorthand.Number of concurrent LLM calls. Use
-c as shorthand. Higher values speed up processing.Maximum requests per minute to prevent rate limiting.
Use semantic clustering with embeddings for candidate selection. Requires:
pip install sift-kg[embeddings]Output directory containing graph data. Use
-o as shorthand.Enable verbose logging. Use
-v as shorthand.Behavior
Resolution Process
- Load Graph - Reads
graph_data.jsonfrom output directory - Candidate Selection - Groups similar entities by name/type
- Optional: Uses embeddings for semantic clustering if
--embeddingsenabled
- Optional: Uses embeddings for semantic clustering if
- LLM Resolution - Asks LLM to judge if candidate pairs are duplicates
- Merge Proposals - Generates proposals with confidence scores
- Variant Detection - Identifies variant relationships (e.g., class inheritance)
Embedding-Based Resolution
When--embeddings is enabled:
- Computes semantic embeddings for entity names and attributes
- Uses clustering to find similar entities beyond exact name matches
- More accurate but requires additional dependencies
Output Files
merge_proposals.yaml
Contains entity merge proposals with three sections:- draft - Pending review
- confirmed - Approved for merging
- rejected - Declined merges
cluster_id- Unique identifiermembers- List of entity IDs to mergecanonical_name- Suggested merged entity namereasoning- LLM explanationconfidence- Score from LLM
{output_dir}/merge_proposals.yaml
relation_review.yaml (updated)
Variant relationships detected during resolution are appended to this file. These are relations like EXTENDS or IS_A that indicate entity variants rather than duplicates.Examples
Basic resolution
With embeddings for better accuracy
High-performance resolution
With custom domain context
Output Summary
Displays:- Number of merge proposals generated
- Number of variant relationships found
- Total cost in USD
- Output file location
Next Steps
After resolution:Error Handling
Exits with error if:- No
graph_data.jsonfound (runsift buildfirst) - API key validation fails
- Embeddings not installed when
--embeddingsused
See Also
- build - Build knowledge graph
- review - Review merge proposals
- apply-merges - Apply confirmed merges