Why use OpenRouter?
100+ models
Choose from models across multiple providers through a single API key and configuration.
Free tier options
Several high-quality models are completely free to use — no credits required.
Cost flexibility
Pay-as-you-go pricing on premium models with no commitments or minimum spend.
Seamless fallback
Automatically falls back to Claude if OpenRouter is unavailable due to errors or rate limits.
Hot-swappable
Switch providers without restarting the worker. Changes take effect on the next observation.
Multi-turn conversations
Full conversation history is maintained across API calls for coherent multi-turn exchanges.
Free models
OpenRouter actively supports democratizing AI access. These production-ready free models are suitable for observation extraction.| Model | ID | Parameters | Context | Best for |
|---|---|---|---|---|
| Xiaomi MiMo-V2-Flash | xiaomi/mimo-v2-flash:free | 309B (15B active, MoE) | 256K | Reasoning, coding, agents |
| Gemini 2.0 Flash | google/gemini-2.0-flash-exp:free | — | 1M | General purpose |
| Gemini 2.5 Flash | google/gemini-2.5-flash-preview:free | — | 1M | Latest capabilities |
| DeepSeek R1 | deepseek/deepseek-r1:free | 671B | 64K | Reasoning, analysis |
| Llama 3.1 70B | meta-llama/llama-3.1-70b-instruct:free | 70B | 128K | General purpose |
| Llama 3.1 8B | meta-llama/llama-3.1-8b-instruct:free | 8B | 128K | Fast, lightweight |
| Mistral Nemo | mistralai/mistral-nemo:free | 12B | 128K | Efficient performance |
Default model: Claude Mem uses
xiaomi/mimo-v2-flash:free by default — a 309B parameter mixture-of-experts model that ranks #1 on SWE-bench Verified and excels at coding and reasoning tasks.Free model considerations
- Rate limits: Free models may have stricter rate limits than paid models.
- Availability: Free capacity depends on provider partnerships and demand.
- Queue times: During peak usage, requests may be queued briefly.
- Max tokens: Most free models support up to 65,536 completion tokens.
Getting an API key
Create an account
Go to openrouter.ai and sign in with Google, GitHub, or email.
Open API Keys
Navigate to openrouter.ai/keys.
Configuration
Settings reference
AI provider for observation extraction. Set to
openrouter to use the OpenRouter API.Your OpenRouter API key. Takes precedence over the
OPENROUTER_API_KEY environment variable.Model identifier. Use the full model ID from the OpenRouter models list. Append
:free for free model variants.Maximum number of messages to include in the conversation history per request.
Token budget safety limit. Requests exceeding this estimate trigger context truncation.
Optional URL sent in request headers for analytics attribution in your OpenRouter dashboard.
Optional app name sent in request headers for analytics in your OpenRouter dashboard.
Using the settings UI
Open the viewer
Go to http://localhost:37777 and click the gear icon to open Settings.
Manual configuration
- Settings file
- Environment variable
Edit
~/.claude-mem/settings.json:settings.json
Model selection guide
- Free models (no cost)
- Paid models (higher quality)
Recommended:
xiaomi/mimo-v2-flash:free- Best-in-class performance on coding benchmarks (SWE-bench #1)
- 256K context window handles large observations
- 65K max completion tokens
- Mixture-of-experts architecture (15B active parameters)
google/gemini-2.0-flash-exp:free— 1M context, Google’s flagshipdeepseek/deepseek-r1:free— Excellent reasoning capabilitiesmeta-llama/llama-3.1-70b-instruct:free— Strong general purpose
Context window management
The OpenRouter agent implements automatic context management to prevent runaway costs.Automatic truncation
The agent uses a sliding window strategy before each request:- Checks if message count exceeds
CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES(default: 20) - Checks if estimated tokens exceed
CLAUDE_MEM_OPENROUTER_MAX_TOKENS(default: 100,000) - If either limit is exceeded, keeps only the most recent messages
- Logs warnings with the count of dropped messages
Token estimation and cost tracking
Token usage is estimated conservatively at 1 token ≈ 4 characters for proactive truncation. Actual usage from the API response is logged after each request:Multi-turn conversation support
The OpenRouter agent maintains full conversation history across API calls within a session:Switching providers
- Via settings UI
- Via settings file
Open Settings
Open the viewer at http://localhost:37777 and click the gear icon.
Fallback behavior
If OpenRouter encounters errors, Claude Mem automatically falls back to the Claude Agent SDK.Conditions that trigger fallback
Conditions that trigger fallback
- Rate limiting (HTTP 429)
- Server errors (HTTP 500, 502, 503)
- Network issues (connection refused, timeout)
- Generic fetch failures
Conditions that do not trigger fallback
Conditions that do not trigger fallback
- Missing API key: Logs a warning and uses Claude from the start (no mid-session switch).
- Invalid API key: Fails with an error rather than silently switching.
Fallback is transparent: Your observations continue processing without interruption. The fallback preserves all conversation context so nothing is lost.
API details
OpenRouter uses an OpenAI-compatible REST API. Endpoint:https://openrouter.ai/api/v1/chat/completions
Provider comparison
| Feature | Claude (SDK) | Gemini | OpenRouter |
|---|---|---|---|
| Cost | Pay per token | Free tier + paid | Free models + paid |
| Models | Claude only | Gemini only | 100+ models |
| Quality | Highest | High | Varies by model |
| Rate limits | Based on tier | 5–4,000 RPM | Varies by model |
| Fallback | N/A (primary) | Falls back to Claude | Falls back to Claude |
| Setup | Automatic | API key required | API key required |
Troubleshooting
"OpenRouter API key not configured"
"OpenRouter API key not configured"
Supply the API key in one of two ways:
- Set
CLAUDE_MEM_OPENROUTER_API_KEYin~/.claude-mem/settings.json, or - Set the
OPENROUTER_API_KEYenvironment variable
Rate limiting
Rate limiting
Free models may have rate limits during peak usage. Claude Mem automatically falls back to the Claude SDK when rate limits are hit.To reduce rate limiting:
- Switch to a different free model (availability varies by provider partnership)
- Add credits to your OpenRouter account to access premium models with higher limits
Model not found
Model not found
Verify the model ID is correct:
- Check openrouter.ai/models for current availability
- Use the
:freesuffix for free model variants (e.g.,meta-llama/llama-3.1-70b-instruct:free) - Model IDs are case-sensitive
High token usage warnings
High token usage warnings
If you see warnings about high token usage (>50,000 per request):
- Reduce
CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGESto a lower value - Reduce
CLAUDE_MEM_OPENROUTER_MAX_TOKENSto set a tighter budget - Consider a model with a larger context window
Connection errors
Connection errors
If you see connection errors:
- Check your internet connection
- Verify OpenRouter service status at status.openrouter.ai
- Claude Mem will automatically fall back to Claude when connection errors occur
Next steps
Gemini provider
Use Google’s Gemini API as an alternative free provider for observation extraction.
Configuration reference
Full settings reference for all Claude Mem configuration options.