The Architecture
Why Three Paths?
Each retrieval method has strengths and blind spots:| Path | Catches | Misses |
|---|---|---|
| Vector | Synonyms, paraphrases, concepts | Exact names, entities |
| TAG_INDEX | Explicitly tagged entities | Untagged content |
| Keyword Grep | Exact string matches | Semantic variations |
Example: Searching for an Entity
Query: “Find information about Protocol 139”- Vector search returns:
- Documents about “decentralized leadership” (semantically related)
- Files discussing “command structure” (conceptually similar)
- TAG_INDEX returns:
#leadership → protocols/139-decentralized-command.md(exact entity match)- Files explicitly tagged with
#protocol-139
- Keyword grep returns:
- Any file containing the literal string “Protocol 139”
- Recent uncommitted files not yet in Supabase
Path 1: Vector Semantic Search
How It Works
Similarity search
Cosine similarity search runs across 11 Supabase tables containing your workspace content.
Usage
Strengths
Conceptual matching
Finds “fitness tracking” when you search for “health monitoring”
Synonym handling
Understands “automobile” and “car” are related
Paraphrase detection
Matches different phrasings of the same idea
Context awareness
Ranks by relevance to your query context
Limitations
- May miss exact entity names (“John Smith” vs “J. Smith”)
- Requires content to be synced to Supabase
- Can’t find content created seconds ago
Path 2: TAG_INDEX Lookup
How It Works
Tag extraction
The
generate_tag_index.py script scans all workspace files for:- Inline
#hashtagsin Markdown - YAML frontmatter tags
- Protocol IDs and entity names
Example Output
Usage
Generating the Index
- 1000+ tags indexed
- 4 directories scanned (
.context/,.agent/,examples/protocols/,user_profile/) - Extraction methods: YAML frontmatter + inline hashtags
Strengths
Instant lookup
Zero latency for tagged entities
Exact matches
No false positives for entity names
Manual curation
You control what gets tagged and how
No API costs
Pure filesystem operation
Limitations
- Only finds explicitly tagged content
- Requires manual tagging discipline
- Misses untagged but relevant files
Path 3: Keyword Grep
How It Works
Simple string matching across all files:Strengths
Zero false negatives
If the string exists, grep finds it
Finds new files
Catches content not yet synced to Supabase
Exact phrases
Finds literal matches like “Protocol 139”
No dependencies
Works offline, no API or database needed
Limitations
- No semantic understanding
- High false positive rate for common words
- Case sensitivity issues (use
-iflag)
When to Use Each Path
- Conceptual Questions
- Entity Lookup
- Exact Phrases
- Complex Analysis
Use: Vector Search (primary)Examples:
- “What did we discuss about X?”
- “Show me files related to leadership”
- “Find documents about decentralized systems”
The Search Protocol (§0.7.1)
Per Core Identity, every query triggers semantic context retrieval:Performance Comparison
Before Triple-Path (Vector Only)
| Scenario | Result |
|---|---|
| Search entity name | ❌ Missed related protocol |
| Search archetype | ❌ Missed profile file |
| Search “decentralized” | ✅ Found semantically |
| New unsynced file | ❌ Not in Supabase yet |
After Triple-Path
| Scenario | Result |
|---|---|
| Search entity name | ✅ Found via TAG_INDEX |
| Search archetype | ✅ Found via TAG_INDEX |
| Search “decentralized” | ✅ Still works (Vector) |
| New unsynced file | ✅ Found via grep |
Best Practices
Tag important entities
Add
#tags to protocols, workflows, and key documents for instant lookup.Sync regularly
Run
supabase_sync.py to keep your vector embeddings current.Regenerate TAG_INDEX
Run
generate_tag_index.py after adding new files or tags.Use all three paths
For important searches, don’t rely on just one method.
Next Steps
Multi-Model Strategy
Learn how to route different tasks to different AI models
Importing Data
Bring existing knowledge into your Athena workspace