Overview
Athena’s Vector RAG system provides semantic search across your entire knowledge base using gemini-embedding-001 (3,072 dimensions) and Supabase pgvector for storage. The system uses Reciprocal Rank Fusion (RRF) to combine multiple search strategies for optimal retrieval.Architecture
Search Strategies
Athena combines multiple retrieval strategies using RRF:- Canonical Memory — Direct lookup in
CANONICAL.md - Tag Index — Hashtag-based retrieval from
TAG_INDEX.md - Vector Similarity — Cosine similarity in pgvector
- GraphRAG — Entity-relationship traversal (deprecated)
- SQLite FTS — Full-text search for keywords
- Filename Matching — Fuzzy matching on file paths
Vector Storage
Collections
Athena indexes 11 knowledge domains in Supabase:| Collection | Content Type | RPC Function |
|---|---|---|
sessions | Session logs | search_sessions |
case_studies | Documented patterns | search_case_studies |
protocols | Skill protocols | search_protocols |
capabilities | Bionic Triple Crown | search_capabilities |
playbooks | Runbooks and workflows | search_playbooks |
references | External frameworks | search_references |
frameworks | Decision models | search_frameworks |
workflows | Slash commands | search_workflows |
entities | Knowledge graph nodes | search_entities |
user_profile | User context | search_user_profile |
system_docs | Architecture docs | search_system_docs |
Embedding Generation
Embeddings are generated using Google’s Gemini API with persistent disk caching:- MD5 hash of input text as cache key
- JSON-backed persistent cache in
.agent/state/embedding_cache.json - Thread-safe atomic writes with background saving
- Prevents redundant API calls for repeated queries
The embedding cache uses atomic file operations to prevent corruption during concurrent access.
Thread Safety
Version 1.2 introduced thread-safe optimizations for parallel search:Thread-Local Clients
Atomic Cache Operations
- Lock-protected mutations to prevent race conditions
- Atomic swap pattern with
tempfile.mkstemp()+os.replace() - Background daemon threads for non-blocking I/O
Search Implementation
Basic Vector Search
Collection-Specific Wrappers
Hybrid Search with RRF
Thesmart_search tool combines all strategies using Reciprocal Rank Fusion:
k = 60 (standard constant) and rank_i(d) is the rank of document d in retrieval system i.
Data Residency Options
| Mode | Where Data Lives | Best For |
|---|---|---|
| Cloud | Supabase (your project) | Cross-device access, collaboration |
| Local | Your machine only | Sensitive data, air-gapped environments |
| Hybrid | Local files + cloud embeddings | Best of both worlds |
Local Mode
For sensitive data that shouldn’t leave your machine:Database Schema
Each collection table uses the same schema:Performance Optimizations
1. Embedding Cache
- Hit Rate: ~80% for repeated queries
- Speedup: 100x faster (no API call)
- Storage: JSON file, typically less than 5MB
2. Thread-Local Clients
- Prevents connection pool exhaustion
- Enables safe parallel search across collections
- Fixes
httpx.ReadErrorin concurrent loops
3. Background Cache Writes
- Non-blocking I/O via daemon threads
- Atomic swap prevents corruption
- Dirty flag reduces unnecessary writes
Implementation Reference
Seesrc/athena/memory/vectors.py:119 for the get_embedding implementation and src/athena/tools/search.py for the hybrid search logic.
Next Steps
MCP Server
Expose search capabilities via MCP tools
Governance
Learn about Triple-Lock protocol and integrity checks