Skip to main content
Athena is model-agnostic β€” your memory, protocols, and governance persist across any LLM. This means you can use different models for different tasks and get the best of each.

Why Multi-Model?

Not all tasks require frontier model capabilities. By routing intelligently, you can:
  • Reduce costs by 50% while maintaining quality where it matters
  • Increase speed by using fast models for mechanical work
  • Cross-validate important decisions across multiple models
  • Leverage strengths of different model architectures

Frontier Models (Deep Reasoning)

ModelStrengthsBest Used For
Claude Opus 4.6Deep reasoning, code quality, nuanced analysisCoding, architecture, verification
Gemini 3.1 ProBroad knowledge, fast synthesis, strong planningGeneral work, research, planning
GPT-5.3Alternative perspective, creative tasksTrilateral tiebreaker, creative work

Fast Models (Mechanical Work)

ModelStrengthsBest Used For
Gemini 3 FlashSpeed, low costSession management (/start, /end), quick lookups

Cost Considerations

Athena is free and open source. You only pay for your AI subscription.Recommended plans:
  • Claude Pro / Google AI Pro: ~$20/mo (full access to frontier models)
  • Claude Max / Google AI Ultra: $200–250/mo (extended limits for power users)

Why Invest in Frontier Models?

Athena’s protocols β€” governance, reasoning depth, structured workflows β€” are designed for models that can follow complex multi-step instructions. Smaller/free models may struggle to follow them consistently. This is a long-term investment, not a cost. Frontier models dramatically increase your output quality and consistency.

The Routing Table

Route tasks based on complexity and risk:
Task TypeRecommended TierWhy
Session Management (/start, /end, /save)⚑ Fast (Gemini Flash)Mechanical execution, low reasoning needed
Coding & ImplementationπŸ”₯ Frontier (Claude Opus, Gemini Pro)Code quality scales directly with model capability
Planning & ArchitectureπŸ”₯ FrontierDesign decisions compound β€” invest best reasoning here
General Chat & Q&A🧠 Strong (Gemini Pro)Good enough for most queries
Research & Deep AnalysisπŸ”₯ FrontierSynthesis quality degrades with weaker models
Creative & Brainstorming🧠 Strong or πŸ”₯ FrontierUse Strong for volume, Frontier for refinement
Verification & Code ReviewπŸ”₯ Frontier (different model)Use different model than author for fresh perspective
Quick Lookups & Formatting⚑ FastDon’t waste Frontier tokens on simple tasks

The Trilateral Feedback Loop

When two models disagree on a significant decision, bring in a third:
Model A (Gemini 3.1 Pro)  β†’  Opinion 1
Model B (Claude Opus)     β†’  Opinion 2
                               ↓
                         Conflict detected?
                               ↓
Model C (GPT-5.3, Llama)  β†’  Tiebreaker / Synthesis

When to Trigger

Architecture decisions

Choices with long-term consequences that are expensive to reverse

Risk assessments

When models disagree on severity or probability

Strategy choices

Both options seem equally valid but lead to different outcomes

High-stakes decisions

Any decision where the cost of being wrong is high

When NOT to Trigger

  • Style preferences (just pick one)
  • Low-stakes choices (not worth the tokens)
  • When one model’s answer is clearly more grounded

Cost Optimization Strategy

Key insight: Most of your session is NOT frontier-level work.
Session Phase% of TokensModel TierCost Impact
/start boot~5%⚑ FastMinimal
Exploration & chat~40%🧠 StrongModerate
Core reasoning & coding~40%πŸ”₯ FrontierHighest
/end shutdown~5%⚑ FastMinimal
Verification~10%πŸ”₯ Frontier (alt)Moderate
By routing only the high-value 40% to Frontier models, you can cut effective costs by ~50% while maintaining output quality where it matters.

Model Switching in Practice

In Multi-Model IDEs

1

Start with Fast model

Use Gemini Flash or similar for /start boot scripts and session initialization.
2

Switch to Frontier for complex work

When you hit coding, architecture, or deep analysis tasks, switch to Claude Opus or Gemini 3.1 Pro.
3

Drop back to Strong/Fast for routine tasks

For formatting, file operations, simple Q&A, use lower-tier models.
4

End with Fast model

Run /end shutdown scripts with fast model to save tokens.

Cross-IDE Validation

For the trilateral loop, use different IDEs entirely:
# First opinion
antigravity --model gemini-3.1-pro
Athena’s Markdown-based memory means all three IDEs can read the same context.

Anti-Patterns

❌ Don’tβœ… Do Instead
Use Frontier for /start and /endUse Fast β€” it’s mechanical work
Use Fast for architecture decisionsUse Frontier β€” design compounds
Use one model for everythingRoute by task type
Skip verification entirelyUse a different model to review critical code
Run trilateral loop on every questionReserve it for high-stakes disagreements

Quick Reference Card

/start, /end, /save       β†’  ⚑ Fast (Gemini Flash)
Coding, web dev, apps      β†’  πŸ”₯ Frontier (Claude Opus / Gemini 3.1 Pro)
Planning, architecture     β†’  πŸ”₯ Frontier (never Fast)
General chat, Q&A          β†’  🧠 Strong (Gemini 3.1 Pro), toggle Frontier for depth
Research, deep analysis    β†’  πŸ”₯ Frontier
Verification, code review  β†’  πŸ”₯ Frontier (DIFFERENT model than author)
Conflict resolution        β†’  🌐 Trilateral Loop (3rd model as tiebreaker)
Quick lookups, formatting  β†’  ⚑ Fast

Example Session Workflow

1

Session Start (Fast Model)

# Model: Gemini Flash
/start
Loads Core Identity, userContext, productContext, activeContext (~2K tokens)
2

Planning Phase (Frontier Model)

# Switch to: Claude Opus
/plan "Build user authentication system"
Deep reasoning for architecture decisions, applies Protocol 123 (Einstein Protocol)
3

Implementation (Frontier Model)

# Continue with: Claude Opus
"Implement JWT authentication with refresh tokens"
High-quality code generation with security considerations
4

Quick Formatting (Fast Model)

# Switch to: Gemini Flash
"Format this code with prettier"
Mechanical task, no reasoning needed
5

Verification (Different Frontier Model)

# Switch to: Gemini 3.1 Pro
"Review this authentication code for security issues"
Fresh perspective catches issues the author model missed
6

Session End (Fast Model)

# Switch to: Gemini Flash
/end
Synthesizes session, commits changes, updates logs (~600 tokens)

Platform-Specific Tips

Antigravity / Multi-Model IDEs

Most modern agentic IDEs let you switch models mid-session via dropdown or command.

Claude Code

Use .clauderc to define model presets:
{
  "modelPresets": {
    "fast": "claude-3-haiku-20240307",
    "strong": "claude-3.5-sonnet-20241022",
    "frontier": "claude-opus-4.6"
  }
}

ChatGPT / OpenAI

Switch models via --model flag or web interface.

Cost Savings Example

Scenario: 8-hour work day, 5 sessions

Without routing (all Frontier):
  • 5 sessions Γ— 200K tokens avg = 1M tokens/day
  • Cost: ~$30/day (estimate)
With routing:
  • /start + /end: 10 sessions Γ— 3K tokens = 30K (Fast)
  • Routine work: 400K tokens (Strong)
  • Core work: 400K tokens (Frontier)
  • Verification: 100K tokens (Frontier alt)
  • Total Frontier: 500K tokens/day
  • Cost savings: ~50% while maintaining quality

Best Practices

Default to Strong

Use Strong models (Gemini Pro) for general work. Only escalate to Frontier when needed.

Never Fast for architecture

Design decisions compound. Always use Frontier models for planning and architecture.

Fresh eyes for review

Use a DIFFERENT model to review code than the one that wrote it.

Track your patterns

Monitor which tasks genuinely benefit from Frontier vs Strong. Adjust routing over time.

Next Steps

Semantic Search

Learn how Athena finds context across your workspace

Best Practices

Operational discipline for running Athena sustainably

Build docs developers (and LLMs) love