Learning Architecture
Conversations happen
The agent responds to messages using its current context (SOUL.md, skills, lessons.md).
Self-corrections are captured
When a user corrects the agent, it records the correction in the
self_corrections field. This is written to the family’s lessons.md immediately — you see it in context on the next message.Review process analyzes behavior
review_loop.py reads conversation logs and checks agent behavior against skill guidance. It produces findings and lesson recommendations.From SOUL.md: “The LESSONS sections in your context come from that review process. They aren’t rules handed down — they’re corrections earned from real conversations.”
lessons.md Format
Lessons are stored as timestamped bullet points with optional category tags:Lesson Categories
The system recognizes three categories (optional tags):- [behavioral]
- [factual]
- [operational]
How to communicate, when to ask, how to handle corrections.Example: “One question per response, always. Never stack multiple questions.”
Eviction Strategy
When lessons exceed the max (default: 20 for review output, 30 for family files), the system uses category-aware eviction:- Oldest entries are removed first
- Each category is guaranteed at least 1 slot
- Slots are distributed proportionally to how many entries each category has
Self-Corrections
When the agent is corrected in conversation, it uses theself_corrections field in its JSON response:
families/kano/lessons.md immediately. The agent sees it in context on the next message.
Why this matters: The agent doesn’t have to wait for a review cycle to learn from direct corrections. Self-corrections create an instant feedback loop.
Review Loop: Mechanical Tier
review_loop.py performs rule-based analysis on recent conversations:
What It Checks
Multi-question violations
Multi-question violations
Detects responses with more than one question mark.Skill violated: social.md says “One question at a time, always.”Lesson generated: “One question per response, always. Never stack multiple questions in a single message.”
Forbidden phrases
Forbidden phrases
Detects phrases like:
- “before I can proceed”
- “before I save”
- “before I can help”
- “I need to know”
User feedback patterns
User feedback patterns
Detects corrections in user messages:
- “that’s wrong” / “that’s not right” / “incorrect”
- “don’t do that” / “stop doing” / “never do”
- “I told you” / “I already said”
Response length
Response length
Flags responses over 500 characters. Recommendation: “Keep SMS responses under 320 chars (2 segments) when possible.”
Usage
Without
--stage, every run writes lessons to real files. --stage writes to a scratch pad (staging/reviews/) instead. Nothing touches production until explicitly promoted.Review Loop: Contextual Tier
The mechanical tier catches rule violations. The contextual tier needs a human (or Opus) reading the full transcript:What Mechanical Analysis Misses
- Agent calling Degitu both “grandmother” and “aunt” in the same response
- Missing member context that caused the confusion
- Agent contradicting itself across messages
- Flows that have no protocol
- Process gaps nobody thought to codify
Workflow
Review the transcript
Read the exchanges. Look for:
- Contradictions
- Missing context that caused errors
- Patterns the agent should follow but doesn’t
Save interesting reviews
staging/saved/ — survives resets, becomes material for deeper analysis.Staging Workflow
Without staging, everyreview_loop run writes lessons to real files. Testing = mutating production data. Staging is a scratch pad — nothing touches production until you explicitly promote it.
Three Piles
- reviews/
- saved/
- proposals/
Disposable test output. Accumulates with each
--stage run. Cleared on reset. You don’t care about most of these.Testing Protocol
Test loop (repeat as needed)
Resist the shiny object. During testing you WILL notice gaps — missing member fields, flows with no protocol, features that seem obvious. DO NOT stop testing to build them. Save the observation to
saved/ and keep going. Real interactions reveal what the abstraction needs to be.Graduation Pipeline
Lessons inlessons.md are temporary — they’re meant to graduate to permanent locations:
- Factual lessons →
family.mdormembers/{name}.md - Behavioral lessons →
skills/social.mdorskills/scheduling.md - Operational lessons →
capabilities.md
Workflow
Max Entries
- Global lessons.md: 50 entries (configurable in
learning/__init__.py) - Family lessons.md: 30 entries (set in
append_lessons()calls) - Review output: 20 entries (prevents prompt bloat)
Example: Real Correction Flow
System writes to lessons.md
sms_handler.py processes the response and calls:families/kano/lessons.md:Signal Sources
The review loop ingests signals from multiple sources:- Conversation logs (
runtime/conversations/{phone}/*.log) — INBOUND→OUTBOUND pairs - PHI audit logs — blocked responses, unknown numbers
- Pending approvals — stale approvals >18h old
- Poller stdout — errors and warnings from the message polling loop