Skip to main content
“I got tired of paying for amnesia.”

Why This Matters

If you’re using AI for anything beyond one-off questions, you’ve probably hit the same wall: every session starts from zero. Yes, ChatGPT has memory now. So does Claude. But their memory is platform-locked. If you switch models, you lose everything. If the platform changes their memory policy, you lose everything. Athena is different: portable, platform-agnostic memory. Your context lives in Markdown files you own. You can take it to any model, any platform, any time. That’s the moat.

The Problem

Every new chat session was a cold start. I was pasting a ~50k-token “identity + context” prompt just to get consistent answers. The best insights from previous sessions were trapped in old transcripts I’d never find again.
Pain PointWhat It Cost Me
No memoryRepeating the same context every session
Lost decisionsCouldn’t remember why I’d decided X in Session 19
Context limits50k tokens of manual paste just to “remind” the AI who I was
Platform lock-inSwitching models meant losing all accumulated context

The Process (The Schlep)

Here’s what I actually did. No shortcuts.
Key insight: I didn’t build this alone. The entire system was co-developed with AI — Claude and Gemini working alongside me in real-time. Every protocol, every architecture decision, every refactor was a collaborative iteration. That’s what makes this approach powerful: the AI helps build the system that makes the AI more useful.

Phase 1: Tool Selection (Week 1)

  • Evaluated agentic IDEs (Cursor, Continue, Aider, Antigravity) — chose Antigravity for native Gemini integration and long context window
  • Set up a Supabase project with pgvector for vector embeddings
  • Configured .env with API keys

Phase 2: Architecture (Weeks 2-4)

  • Designed the directory structure with AI (.framework/ for laws, .context/ for memories, .agent/ for scripts)
  • Built the core loop together: /start (boot) → Work → /end (commit)
  • Created the first 10 protocols — reusable decision frameworks extracted from our collaborative thinking

Phase 3: Data Feeding (Ongoing)

  • Fed it personal knowledge: decision logs, case studies, business frameworks, session transcripts
  • Tagged and indexed files for retrieval (TAG_INDEX.md)
  • Built supabase_sync.py to push Markdown to vector embeddings (or keep local for sensitive data)

Phase 4: Continuous Iteration (500+ Sessions)

Session RangeWhat Changed
1-50Basic boot/end cycle, first protocols
50-150Semantic search added, hybrid RAG
150-300Cross-encoder reranking, RRF fusion
300-400SDK refactor (athena package), typing, tests
400-500+Trilateral feedback, governance audit, external red-teaming
The pattern: Every friction became a protocol. Every failure became a case study. The AI helped document its own evolution.

What the Schlep Looked Like

├── 1,100+ sessions logged (human + AI collaboration)
├── 120+ protocols extracted
├── 49 automation scripts written
├── 3 major refactors (monolith → SDK)
├── 2 external red-team audits
└── Countless errors, dead ends, and "why isn't this working" nights

The Result

Quantitative (What Changed)

MetricBeforeAfter
Context injection~50k tokens (manual paste)~12.5K tokens (core boot)
Boot time~2 minutes~1–2 minutes
Sessions logged1,100+

Qualitative (What It Means)

PillarOutcome
AgencyI stopped recreating context and started compounding it. Every session builds on the last.
PortabilityMy memory isn’t trapped in ChatGPT or Claude. It’s mine. I can take it anywhere.
Learning120+ protocols extracted from my own decisions — patterns I can reuse and refine.

Proof It Works

In Session 400, Athena recalled a trading risk limit I’d set in Session 19 — months earlier — and flagged it before I repeated an old mistake.
├── Query: "position sizing rules"
├── Retrieved: protocols/trading/risk_limits.md (similarity: 0.89)
├── Created: 2025-03-14 | Last accessed: 2025-12-28
└── Injected: "Max daily loss: 2% of account. Hard stop."
A generic chat assistant would have missed it. Athena didn’t.

What I Learned

InsightPrinciple
Co-development is the unlockBuilding with AI, not just using AI, creates compounding returns.
Portable memory beats platform memoryOwn your context. Don’t rent it from OpenAI or Anthropic.
Retrieval is end-to-endSimple RAG fails on broad queries. RRF fusion + reranking solved quality/latency tradeoff.
Protocols beat promptsReusable decision frameworks outlast one-shot prompt engineering.
Ship at 70%Perfectionism kills velocity. Iterate in production.

The Bionic Unit (v7.5 Philosophy)

Definition: Human cognition + AI reasoning, integrated as one collaborative workflow.
This isn’t about “using AI as a tool.” It’s about co-development — building with the AI, not just through it.
ComponentRole
HumanIntent, judgment, domain expertise, final decision
AIPattern recall, synthesis, execution speed, adversarial checks
TogetherCompound decision-making that neither could achieve alone
The AI helps build the system that makes the AI more useful. That’s the recursive loop.

The Triple-Lock (Law #6)

Pattern: Search → Save → Speak.
Every AI response should be grounded in three steps:
  1. Search: Retrieve relevant context from memory (semantic search, past sessions, protocols).
  2. Save: Log intent/summary before responding (audit trail, disaster recovery).
  3. Speak: Deliver the response.
┌─────────────────────────────────────────────────┐
│  STEP 1: Semantic Search (context retrieval)    │
│  STEP 2: Quicksave (log intent)                 │
│  STEP 3: Output response                        │
│                                                 │
│  ✅ Search → Save → Speak                        │
└─────────────────────────────────────────────────┘
This pattern ensures:
  • Context awareness: Responses are informed by history.
  • Recoverability: Work is checkpointed before it can be lost.
  • Transparency: The “work” is visible before the “answer.”

About the Author

Built by Winston Koh — 10+ years in financial services, now building AI systems. GitHub | LinkedIn

Build docs developers (and LLMs) love