Skip to main content

Overview

Athena’s memory system combines structured state (Memory Bank) with semantic recall (VectorRAG) to create a compound learning loop where each session starts smarter than the last.
Think of it as the difference between:
  • Vector DB: “Where did I see something about X?” (search)
  • Memory Bank: “What am I working on right now?” (state)
You need both.

The Two-Layer Architecture

Memory Bank: The 4 Pillars

Core Files

FilePurposeUpdate Frequency
activeContext.mdCurrent focus, active tasks, recent decisionsEvery session
userContext.mdUser profile, preferences, constraintsWhen preferences change
productContext.mdProduct philosophy, goals, positioningWhen strategy changes
systemPatterns.mdArchitecture decisions, patterns, tech debtWhen architecture evolves

How It Works

1

Session Start

The boot script loads all 4 Memory Bank files into context (~10K tokens)
2

During Session

The AI references Memory Bank state for continuity
3

Session End

The shutdown script updates activeContext.md with session outcomes

Token Efficiency: The 15K Hard Cap

Boot tokens are budgeted with a strict ceiling:
SlotBudgetGrowth Rate
userContext.md~3KNear-zero (identity is stable)
productContext.md~2KNear-zero (mission is stable)
activeContext.md~5KRolling (compacts automatically)
Boot script output~2KFixed
System instructions~3KFixed
Total~15K max
When the total exceeds 15K tokens, activeContext.md auto-compacts—merging older session summaries into shorter entries until the budget is back under 10K.

The Operating Band

0K ██████████░░░░░ 15K
   ↑ ~10K target    ↑ hard cap (auto-compact triggers here)
Assuming 200K effective context length (the industry standard for SOTA models in 2026):
ModeBoot CostWorkspace Left
/start (default)~10K190K (95% free)
/think~15K185K
/ultrathink~40K160K

Progressive Distillation

Most “memory” solutions dump growing chat history into context. Athena keeps boot cost flat through progressive distillation:
Live conversation (100% fidelity)
  → Session log (~15% — key insights only)
    → activeContext.md entry (~5% — compressed summary)
      → Eventually compacted out (~0.1% — absorbed into userContext.md)

VectorRAG: Semantic Memory

Architecture

Technology Stack

ComponentTechnologyPurpose
Vector DatabaseSupabase + pgvectorCloud-native, persistent storage
EmbeddingsGoogle text-embedding-0043072-dimension semantic vectors
SimilarityCosine Distance (<=>)Meaning-based matching
SyncPython ScriptsAutomated indexing pipeline

The 11 Searchable Domains

  • sessions (~468): Daily interaction logs
  • case_studies (~75): Pattern analysis documents
  • entities (~100 chunks): External data imports
Total Indexed Documents: ~850+

How Semantic Search Works

Similarity Scoring

Similarity ScoreInterpretation
> 0.7Highly relevant
0.5 - 0.7Moderately relevant
0.3 - 0.5Loosely related
< 0.3Likely noise

Memory Bank vs. VectorRAG

NeedUse
”What am I working on?”Memory Bank (activeContext.md)
“What did I say about X 3 weeks ago?”Vector search (smart_search.py)
“What’s my risk tolerance?”Memory Bank (userContext.md)
“Find sessions about authentication”Vector search

Context Hydration

Problem: Learnings written to files (e.g., User_Profile_Core.md) become passive documentation. The AI doesn’t read them unless explicitly prompted, causing the same mistakes to repeat.
Solution: Active Injection — Force-feed critical constraints into the terminal during boot. Key Scripts:
  • boot_knowledge.py: Extracts and prints constraints
  • index_workspace.py: Rebuilds TAG_INDEX.md and PROTOCOL_SUMMARIES.md on shutdown

Real-World Example

Scenario: Business Deal Advice

Query: “Should I accept this commission-based partnership where the agent takes no risk?”
Output: “Commission-based partnerships can be effective for motivation. Ensure you have clear contracts. Pros include low fixed costs, while cons include potential short-term focus by the agent.”Verdict: Safe but generic. Balanced pros/cons list but lacks strategic conviction.

Autonomic Behaviors

ProtocolTriggerAction
QuicksaveEvery user exchangequicksave.py → checkpoint to session log
Intent PersistenceSignificant logical changeTASK_LOG.md → document the “WHY” behind code changes
Auto-DocumentationPattern detectedFile to appropriate location
Orphan DetectionOn /endorphan_detector.py → link or alert

Cost Analysis

ResourceFree TierPaid Tier
Supabase500MB DB, 2GB bandwidth$25/mo for 8GB
Gemini Embeddings1,500 req/dayN/A (no cost beyond free)
Total$0/month~$25/month at scale
At ~730 documents, we’re well within free tier limits. Embeddings are generated once per document, so ongoing costs are minimal.

Next Steps

Architecture

Understand the overall system design

Workflows

Learn about session management and automation

Build docs developers (and LLMs) love