Semantic Search

Athena uses Triple-Path Retrieval to ensure no relevant context is missed. Each search method catches what the others miss.

The Architecture

                               USER QUERY
                                   │
                                   ▼
               ┌───────────────────┴───────────────────┐
               │        TRIPLE-PATH RETRIEVAL          │
               └───────────────────┬───────────────────┘
                                   │
         ┌─────────────────────────┼─────────────────────────┐
         │                         │                         │
         ▼                         ▼                         ▼
 ┌───────────────┐        ┌───────────────┐        ┌───────────────┐
 │    PATH 1     │        │    PATH 2     │        │    PATH 3     │
 │               │        │               │        │               │
 │  🔮 VECTOR    │        │  🏷️ TAG      │        │  🔎 KEYWORD   │
 │   SEARCH      │        │   INDEX       │        │    GREP       │
 │               │        │               │        │               │
 │  (Semantic)   │        │  (Hashtags)   │        │  (Exact)      │
 └───────┬───────┘        └───────┬───────┘        └───────┬───────┘
         │                         │                         │
         ▼                         ▼                         ▼
  "decentralized"          "#leadership"           "Protocol 139"
  → finds related           → finds tagged         → finds exact
    concepts                   entities               matches
         │                         │                         │
         └─────────────────────────┼─────────────────────────┘
                                   │
                                   ▼
                         ┌─────────────────┐
                         │  MERGED CONTEXT │
                         └─────────────────┘

Why Three Paths?

Each retrieval method has strengths and blind spots:

Path	Catches	Misses
Vector	Synonyms, paraphrases, concepts	Exact names, entities
TAG_INDEX	Explicitly tagged entities	Untagged content
Keyword Grep	Exact string matches	Semantic variations

Example: Searching for an Entity

Query: “Find information about Protocol 139”

Vector search returns:
- Documents about “decentralized leadership” (semantically related)
- Files discussing “command structure” (conceptually similar)
TAG_INDEX returns:
- #leadership → protocols/139-decentralized-command.md (exact entity match)
- Files explicitly tagged with #protocol-139
Keyword grep returns:
- Any file containing the literal string “Protocol 139”
- Recent uncommitted files not yet in Supabase

Result: The combination finds the protocol file directly (TAG_INDEX), related concepts (Vector), and any recent mentions (Grep).

Path 1: Vector Semantic Search

How It Works

Query embedding

Your search query is converted to a 3072-dimension embedding using Gemini API.

Similarity search

Cosine similarity search runs across 11 Supabase tables containing your workspace content.

Ranked results

Returns top matches ranked by semantic similarity score.

Usage

python3 scripts/supabase_search.py "<query>" --limit 5

Strengths

Conceptual matching

Finds “fitness tracking” when you search for “health monitoring”

Synonym handling

Understands “automobile” and “car” are related

Paraphrase detection

Matches different phrasings of the same idea

Context awareness

Ranks by relevance to your query context

Limitations

May miss exact entity names (“John Smith” vs “J. Smith”)
Requires content to be synced to Supabase
Can’t find content created seconds ago

Path 2: TAG_INDEX Lookup

How It Works

Tag extraction

The generate_tag_index.py script scans all workspace files for:

Inline #hashtags in Markdown
YAML frontmatter tags
Protocol IDs and entity names

Reverse index creation

Creates a lookup table: #tag → [file1, file2, ...]

Instant lookup

When you search for a tagged entity, grep the TAG_INDEX for immediate results.

Example Output

| #leadership | protocols/139-decentralized-command.md |
| #archetype  | user_profile/Archetype_Example.md |
| #vectorrag  | docs/VECTORRAG.md, docs/SEMANTIC_SEARCH.md |

Usage

grep -i "<entity>" .context/TAG_INDEX.md

Generating the Index

python3 scripts/generate_tag_index.py

Current stats (as of Feb 2026):

1000+ tags indexed
4 directories scanned (.context/, .agent/, examples/protocols/, user_profile/)
Extraction methods: YAML frontmatter + inline hashtags

Strengths

Instant lookup

Zero latency for tagged entities

Exact matches

No false positives for entity names

Manual curation

You control what gets tagged and how

No API costs

Pure filesystem operation

Limitations

Only finds explicitly tagged content
Requires manual tagging discipline
Misses untagged but relevant files

Path 3: Keyword Grep

How It Works

Simple string matching across all files:

grep -ri "<keyword>" .context/ .agent/

Strengths

Zero false negatives

If the string exists, grep finds it

Finds new files

Catches content not yet synced to Supabase

Exact phrases

Finds literal matches like “Protocol 139”

No dependencies

Works offline, no API or database needed

Limitations

No semantic understanding
High false positive rate for common words
Case sensitivity issues (use -i flag)

When to Use Each Path

Conceptual Questions
Entity Lookup
Exact Phrases
Complex Analysis

Use: Vector Search (primary)Examples:

“What did we discuss about X?”
“Show me files related to leadership”
“Find documents about decentralized systems”

Why: Semantic matching finds conceptually related content even with different wording.

The Search Protocol (§0.7.1)

Per Core Identity, every query triggers semantic context retrieval:

Vector Search (Always)

python3 scripts/supabase_search.py "<query>" --limit 5

Runs first to capture semantic context.

Entity Lookup (Conditional)

grep -i "<entity_name>" .context/TAG_INDEX.md

If named entities detected in the query.

Fallback Grep (As Needed)

grep -ri "<keyword>" .context/ .agent/

If above methods return sparse results.

Performance Comparison

Before Triple-Path (Vector Only)

Scenario	Result
Search entity name	❌ Missed related protocol
Search archetype	❌ Missed profile file
Search “decentralized”	✅ Found semantically
New unsynced file	❌ Not in Supabase yet

After Triple-Path

Scenario	Result
Search entity name	✅ Found via TAG_INDEX
Search archetype	✅ Found via TAG_INDEX
Search “decentralized”	✅ Still works (Vector)
New unsynced file	✅ Found via grep

Best Practices

Tag important entities

Add #tags to protocols, workflows, and key documents for instant lookup.

Sync regularly

Run supabase_sync.py to keep your vector embeddings current.

Regenerate TAG_INDEX

Run generate_tag_index.py after adding new files or tags.

Use all three paths

For important searches, don’t rely on just one method.

Next Steps

Multi-Model Strategy

Learn how to route different tasks to different AI models

Importing Data

Bring existing knowledge into your Athena workspace

Getting Started

Core Concepts

Guides

Use Cases

Advanced

​The Architecture

​Why Three Paths?

​Example: Searching for an Entity

​Path 1: Vector Semantic Search

​How It Works

​Usage

​Strengths

Conceptual matching

Synonym handling

Paraphrase detection

Context awareness

​Limitations

​Path 2: TAG_INDEX Lookup

​How It Works

​Example Output

​Usage

​Generating the Index

​Strengths

Instant lookup

Exact matches

Manual curation

No API costs

​Limitations

​Path 3: Keyword Grep

​How It Works

​Strengths

Zero false negatives

Finds new files

Exact phrases

No dependencies

​Limitations

​When to Use Each Path

​The Search Protocol (§0.7.1)

​Performance Comparison

​Before Triple-Path (Vector Only)

​After Triple-Path

​Best Practices

Tag important entities

Sync regularly

Regenerate TAG_INDEX

Use all three paths

​Next Steps

Multi-Model Strategy

Importing Data

Build docs developers (and LLMs) love

The Architecture

Why Three Paths?

Example: Searching for an Entity

Path 1: Vector Semantic Search

How It Works

Usage

Strengths

Limitations

Path 2: TAG_INDEX Lookup

How It Works

Example Output

Usage

Generating the Index

Strengths

Limitations

Path 3: Keyword Grep

How It Works

Strengths

Limitations

When to Use Each Path

The Search Protocol (§0.7.1)

Performance Comparison

Before Triple-Path (Vector Only)

After Triple-Path

Best Practices

Next Steps