Overview
The plugin builds a mathematical model of word-tag associations by analyzing all tagged notes in your vault. When you open or edit a note, it compares the note’s content against this model to suggest relevant tags.The plugin only learns from notes that already have tags. At least 2 documents must contain a tag before it can be suggested.
The Four-Step Process
1. Vault Scanning
When Obsidian starts (or when you manually trigger a rescan), the plugin:- Scans all markdown files in your vault (in batches of 100 for performance)
- Extracts tags from both inline format (
#tag) and frontmatter YAML - Tokenizes note content into words
- Builds statistical profiles for each tag
- Words that are at least 3 characters long
- Words that aren’t stop words (common words like “the”, “and”, “is”)
- Words that aren’t pure numbers
- Content from note body (excluding frontmatter, code blocks, and images)
- Frontmatter blocks
- Code blocks (fenced and inline)
- Image embeds
- Link URLs (but link text is kept)
- Wiki link paths
- Existing tags
- Stop words in English and German
src/stopwords.ts:1.
2. TF-IDF Vectors
Once scanning completes, the plugin calculates TF-IDF (Term Frequency-Inverse Document Frequency) vectors for each tag.What is TF-IDF?
What is TF-IDF?
TF-IDF is a statistical measure that reflects how important a word is to a tag across your entire vault.
- TF (Term Frequency): How often a word appears in notes with this tag
- IDF (Inverse Document Frequency): How rare the word is across all tagged documents
src/model.ts:294-301):
For each tag profile:
3. Cosine Similarity
When you open or edit a note, the plugin:- Tokenizes the current note content into a TF-IDF vector (using the same process)
- Compares this vector against each tag’s precomputed vector using cosine similarity
- Filters out tags already present in the note
- Filters out tags that appear in fewer than 2 documents
src/model.ts:123-129):
Only tags that appear in at least 2 documents can be suggested. This prevents overfitting to single examples.
4. Co-occurrence Boosting
The final step applies a relevance boost based on tag co-occurrence patterns (fromsrc/model.ts:133-142).
If your note already has certain tags, and those tags frequently appear together with a candidate tag in your vault, the candidate’s score is boosted:
#python in your note, and #programming appears in 80% of your notes tagged with #python, then #programming gets a significant boost.
This helps the plugin learn implicit tag hierarchies and related concepts from your existing patterns.
Content Extraction
Tag Extraction
The plugin recognizes tags in two formats (fromsrc/model.ts:155-195):
Inline tags:
# prefixes are stripped).
Text Tokenization
The tokenization process (fromsrc/model.ts:197-217):
- Removes frontmatter blocks
- Removes code blocks (fenced and inline)
- Removes image embeds
- Extracts text from markdown links
- Extracts text from wiki links (ignoring aliases)
- Removes existing tags
- Removes heading markers and emphasis
- Splits on non-word boundaries
- Converts to lowercase
- Filters by length (≥3 characters), stop words, and pure numbers
Performance Considerations
- Scanning
- Suggestions
- Memory
Vault scanning processes files in batches of 100 and yields to the UI thread between batches (from
src/model.ts:63-76). This keeps Obsidian responsive even with large vaults.After scanning completes, you’ll see a notice showing:- Number of unique tags learned
- Number of tagged documents processed
When to Rescan
The model is built once at startup. You should manually rescan when:- You’ve added many new tags to existing notes
- You’ve bulk-imported notes with tags
- You’ve significantly restructured your tagging system
- Tag suggestions seem outdated