Skip to main content
Claude Mem supports OpenRouter as an alternative provider for observation extraction. OpenRouter provides a unified API to access 100+ models from providers including Google, Meta, Mistral, DeepSeek, and many others — often with generous free tiers.
Free models available: OpenRouter offers several completely free models, making it an excellent choice for reducing observation extraction costs to zero while maintaining quality.

Why use OpenRouter?

100+ models

Choose from models across multiple providers through a single API key and configuration.

Free tier options

Several high-quality models are completely free to use — no credits required.

Cost flexibility

Pay-as-you-go pricing on premium models with no commitments or minimum spend.

Seamless fallback

Automatically falls back to Claude if OpenRouter is unavailable due to errors or rate limits.

Hot-swappable

Switch providers without restarting the worker. Changes take effect on the next observation.

Multi-turn conversations

Full conversation history is maintained across API calls for coherent multi-turn exchanges.

Free models

OpenRouter actively supports democratizing AI access. These production-ready free models are suitable for observation extraction.
ModelIDParametersContextBest for
Xiaomi MiMo-V2-Flashxiaomi/mimo-v2-flash:free309B (15B active, MoE)256KReasoning, coding, agents
Gemini 2.0 Flashgoogle/gemini-2.0-flash-exp:free1MGeneral purpose
Gemini 2.5 Flashgoogle/gemini-2.5-flash-preview:free1MLatest capabilities
DeepSeek R1deepseek/deepseek-r1:free671B64KReasoning, analysis
Llama 3.1 70Bmeta-llama/llama-3.1-70b-instruct:free70B128KGeneral purpose
Llama 3.1 8Bmeta-llama/llama-3.1-8b-instruct:free8B128KFast, lightweight
Mistral Nemomistralai/mistral-nemo:free12B128KEfficient performance
Default model: Claude Mem uses xiaomi/mimo-v2-flash:free by default — a 309B parameter mixture-of-experts model that ranks #1 on SWE-bench Verified and excels at coding and reasoning tasks.

Free model considerations

  • Rate limits: Free models may have stricter rate limits than paid models.
  • Availability: Free capacity depends on provider partnerships and demand.
  • Queue times: During peak usage, requests may be queued briefly.
  • Max tokens: Most free models support up to 65,536 completion tokens.
All free models support tool use and function calling, temperature and sampling controls, stop sequences, and streaming responses.

Getting an API key

1

Create an account

Go to openrouter.ai and sign in with Google, GitHub, or email.
2

Open API Keys

Navigate to openrouter.ai/keys.
3

Create a key

Click Create Key, give it a name, and confirm.
4

Copy and store your key

Copy the key and store it somewhere secure. API keys start with sk-or-v1-.
No credit card is required to create an account or use free models. Add credits only if you want to use premium paid models.

Configuration

Settings reference

CLAUDE_MEM_PROVIDER
string
default:"claude"
AI provider for observation extraction. Set to openrouter to use the OpenRouter API.
CLAUDE_MEM_OPENROUTER_API_KEY
string
Your OpenRouter API key. Takes precedence over the OPENROUTER_API_KEY environment variable.
CLAUDE_MEM_OPENROUTER_MODEL
string
default:"xiaomi/mimo-v2-flash:free"
Model identifier. Use the full model ID from the OpenRouter models list. Append :free for free model variants.
CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES
number
default:"20"
Maximum number of messages to include in the conversation history per request.
CLAUDE_MEM_OPENROUTER_MAX_TOKENS
number
default:"100000"
Token budget safety limit. Requests exceeding this estimate trigger context truncation.
CLAUDE_MEM_OPENROUTER_SITE_URL
string
Optional URL sent in request headers for analytics attribution in your OpenRouter dashboard.
CLAUDE_MEM_OPENROUTER_APP_NAME
string
default:"claude-mem"
Optional app name sent in request headers for analytics in your OpenRouter dashboard.

Using the settings UI

1

Open the viewer

Go to http://localhost:37777 and click the gear icon to open Settings.
2

Select OpenRouter

Under AI Provider, select OpenRouter from the dropdown.
3

Enter your API key

Paste your OpenRouter API key into the OpenRouter API Key field.
4

Choose a model (optional)

Enter a model identifier. The default xiaomi/mimo-v2-flash:free is recommended for free usage.
Settings are applied immediately — no restart required.

Manual configuration

Edit ~/.claude-mem/settings.json:
settings.json
{
  "CLAUDE_MEM_PROVIDER": "openrouter",
  "CLAUDE_MEM_OPENROUTER_API_KEY": "sk-or-v1-your-key-here",
  "CLAUDE_MEM_OPENROUTER_MODEL": "xiaomi/mimo-v2-flash:free"
}

Model selection guide

Recommended: xiaomi/mimo-v2-flash:free
  • Best-in-class performance on coding benchmarks (SWE-bench #1)
  • 256K context window handles large observations
  • 65K max completion tokens
  • Mixture-of-experts architecture (15B active parameters)
Alternatives:
  • google/gemini-2.0-flash-exp:free — 1M context, Google’s flagship
  • deepseek/deepseek-r1:free — Excellent reasoning capabilities
  • meta-llama/llama-3.1-70b-instruct:free — Strong general purpose

Context window management

The OpenRouter agent implements automatic context management to prevent runaway costs.

Automatic truncation

The agent uses a sliding window strategy before each request:
  1. Checks if message count exceeds CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES (default: 20)
  2. Checks if estimated tokens exceed CLAUDE_MEM_OPENROUTER_MAX_TOKENS (default: 100,000)
  3. If either limit is exceeded, keeps only the most recent messages
  4. Logs warnings with the count of dropped messages

Token estimation and cost tracking

Token usage is estimated conservatively at 1 token ≈ 4 characters for proactive truncation. Actual usage from the API response is logged after each request:
OpenRouter API usage: {
  model: "xiaomi/mimo-v2-flash:free",
  inputTokens: 2500,
  outputTokens: 1200,
  totalTokens: 3700,
  estimatedCostUSD: "0.00",
  messagesInContext: 8
}

Multi-turn conversation support

The OpenRouter agent maintains full conversation history across API calls within a session:
Session created

Load pending messages (observations from queue)

For each message:
  → Add to conversation history
  → Call OpenRouter API with FULL history
  → Parse XML response
  → Store observations in database
  → Sync to Chroma vector DB

Session complete
This enables coherent multi-turn exchanges, context preservation across observations, and seamless provider switching mid-session.

Switching providers

1

Open Settings

Open the viewer at http://localhost:37777 and click the gear icon.
2

Change provider

Change the AI Provider dropdown to OpenRouter (or another provider).
3

Observe the change

The next observation will use the new provider. No restart is needed, and conversation history is preserved.

Fallback behavior

If OpenRouter encounters errors, Claude Mem automatically falls back to the Claude Agent SDK.
  • Rate limiting (HTTP 429)
  • Server errors (HTTP 500, 502, 503)
  • Network issues (connection refused, timeout)
  • Generic fetch failures
When fallback occurs, a warning is logged, any in-progress messages are reset to pending, and the Claude SDK takes over with the full conversation context.
  • Missing API key: Logs a warning and uses Claude from the start (no mid-session switch).
  • Invalid API key: Fails with an error rather than silently switching.
Fallback is transparent: Your observations continue processing without interruption. The fallback preserves all conversation context so nothing is lost.

API details

OpenRouter uses an OpenAI-compatible REST API. Endpoint: https://openrouter.ai/api/v1/chat/completions
Authorization: Bearer {apiKey}
HTTP-Referer: https://github.com/thedotmack/claude-mem
X-Title: claude-mem
Content-Type: application/json

Provider comparison

FeatureClaude (SDK)GeminiOpenRouter
CostPay per tokenFree tier + paidFree models + paid
ModelsClaude onlyGemini only100+ models
QualityHighestHighVaries by model
Rate limitsBased on tier5–4,000 RPMVaries by model
FallbackN/A (primary)Falls back to ClaudeFalls back to Claude
SetupAutomaticAPI key requiredAPI key required

Troubleshooting

Supply the API key in one of two ways:
  • Set CLAUDE_MEM_OPENROUTER_API_KEY in ~/.claude-mem/settings.json, or
  • Set the OPENROUTER_API_KEY environment variable
The settings file takes precedence if both are present.
Free models may have rate limits during peak usage. Claude Mem automatically falls back to the Claude SDK when rate limits are hit.To reduce rate limiting:
  • Switch to a different free model (availability varies by provider partnership)
  • Add credits to your OpenRouter account to access premium models with higher limits
Verify the model ID is correct:
  • Check openrouter.ai/models for current availability
  • Use the :free suffix for free model variants (e.g., meta-llama/llama-3.1-70b-instruct:free)
  • Model IDs are case-sensitive
If you see warnings about high token usage (>50,000 per request):
  • Reduce CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES to a lower value
  • Reduce CLAUDE_MEM_OPENROUTER_MAX_TOKENS to set a tighter budget
  • Consider a model with a larger context window
If you see connection errors:
  • Check your internet connection
  • Verify OpenRouter service status at status.openrouter.ai
  • Claude Mem will automatically fall back to Claude when connection errors occur

Next steps

Gemini provider

Use Google’s Gemini API as an alternative free provider for observation extraction.

Configuration reference

Full settings reference for all Claude Mem configuration options.

Build docs developers (and LLMs) love