LongMem uses AI-powered compression to keep your memory database lean and relevant. Configure compression providers, models, rate limiting, and circuit breaker settings to optimize performance.
Configuration
Compression settings are configured in the compression section of ~/.longmem/settings.json:
{
"compression": {
"enabled": false,
"provider": "openrouter",
"model": "meta-llama/llama-3.1-8b-instruct",
"apiKey": "",
"baseURL": "https://openrouter.ai/api/v1",
"maxConcurrent": 1,
"idleThresholdSeconds": 5,
"maxPerMinute": 10,
"timeoutSeconds": 30,
"circuitBreakerThreshold": 5,
"circuitBreakerCooldownMs": 60000,
"circuitBreakerMaxCooldownMs": 300000,
"maxRetries": 3
}
}
Configuration fields
enabled
- Type:
boolean
- Default:
false
Enables AI-powered compression. When enabled, LongMem periodically compresses old memories to reduce database size while preserving semantic meaning.
Example:
{
"compression": {
"enabled": true
}
}
You must configure apiKey before enabling compression. The compression feature requires an AI provider to function.
provider
- Type:
string
- Default:
"openrouter"
- Supported providers:
openrouter, openai, anthropic, local
The AI provider to use for compression. Each provider has a default base URL that is automatically configured.
Provider base URLs:
openrouter: https://openrouter.ai/api/v1
openai: https://api.openai.com/v1
anthropic: https://api.anthropic.com/v1
local: http://localhost:11434/v1
Example:
{
"compression": {
"provider": "openai"
}
}
model
- Type:
string
- Default:
"meta-llama/llama-3.1-8b-instruct"
The AI model to use for compression. The model name format depends on your provider.
Example for OpenRouter:
{
"compression": {
"provider": "openrouter",
"model": "meta-llama/llama-3.1-8b-instruct"
}
}
Example for OpenAI:
{
"compression": {
"provider": "openai",
"model": "gpt-4o-mini"
}
}
apiKey
- Type:
string
- Default:
"" (empty)
API key for authenticating with your AI provider. Required when compression is enabled (except for local providers without authentication).
Example:
{
"compression": {
"apiKey": "sk-or-v1-..."
}
}
Never commit your API key to version control. Consider using environment variables or secure credential storage.
baseURL
- Type:
string | undefined
- Default: Auto-configured based on provider
Custom base URL for the API endpoint. If not specified, automatically set based on the provider.
Example for custom endpoint:
{
"compression": {
"provider": "openai",
"baseURL": "https://api.custom-proxy.com/v1"
}
}
maxConcurrent
Maximum number of concurrent compression requests. Increasing this can speed up compression but uses more API quota.
Example:
{
"compression": {
"maxConcurrent": 3
}
}
Start with 1 and increase gradually if compression is too slow. Be mindful of your API provider’s rate limits.
idleThresholdSeconds
Number of seconds the system must be idle before compression starts. Prevents compression from running during active work.
Example:
{
"compression": {
"idleThresholdSeconds": 10
}
}
maxPerMinute
Maximum number of compression requests per minute. Rate limiting to stay within API provider limits.
Example:
{
"compression": {
"maxPerMinute": 20
}
}
timeoutSeconds
Timeout (in seconds) for each compression API request. Requests exceeding this duration are cancelled.
Example:
{
"compression": {
"timeoutSeconds": 60
}
}
circuitBreakerThreshold
Number of consecutive failures before the circuit breaker opens and stops compression attempts.
Example:
{
"compression": {
"circuitBreakerThreshold": 3
}
}
circuitBreakerCooldownMs
- Type:
number
- Default:
60000 (1 minute)
Initial cooldown period (in milliseconds) after circuit breaker opens. The system waits this long before retrying.
Example:
{
"compression": {
"circuitBreakerCooldownMs": 120000
}
}
circuitBreakerMaxCooldownMs
- Type:
number
- Default:
300000 (5 minutes)
Maximum cooldown period (in milliseconds). The cooldown period doubles with each consecutive failure but won’t exceed this value.
Example:
{
"compression": {
"circuitBreakerMaxCooldownMs": 600000
}
}
maxRetries
Number of retry attempts for failed compression requests before giving up.
Example:
{
"compression": {
"maxRetries": 5
}
}
Provider examples
OpenRouter (default)
{
"compression": {
"enabled": true,
"provider": "openrouter",
"model": "meta-llama/llama-3.1-8b-instruct",
"apiKey": "sk-or-v1-..."
}
}
OpenRouter provides access to many models with competitive pricing. The base URL (https://openrouter.ai/api/v1) is set automatically.
OpenAI
{
"compression": {
"enabled": true,
"provider": "openai",
"model": "gpt-4o-mini",
"apiKey": "sk-proj-...",
"maxConcurrent": 2,
"maxPerMinute": 50
}
}
OpenAI provides reliable, high-quality models. Use gpt-4o-mini for cost-effective compression.
Anthropic
{
"compression": {
"enabled": true,
"provider": "anthropic",
"model": "claude-3-haiku-20240307",
"apiKey": "sk-ant-...",
"maxConcurrent": 2
}
}
Anthropic’s Claude models excel at understanding and summarizing context. Use claude-3-haiku for fast, cost-effective compression.
Local (Ollama)
{
"compression": {
"enabled": true,
"provider": "local",
"model": "llama3.1:8b",
"apiKey": "",
"baseURL": "http://localhost:11434/v1",
"maxConcurrent": 1,
"timeoutSeconds": 120
}
}
Run compression with local models using Ollama. No API key required, and all data stays on your machine.
Local models are great for privacy but may be slower. Adjust timeoutSeconds and maxConcurrent based on your hardware.
Advanced configuration
{
"compression": {
"enabled": true,
"provider": "openai",
"model": "gpt-4o-mini",
"apiKey": "sk-proj-...",
"maxConcurrent": 5,
"maxPerMinute": 100,
"timeoutSeconds": 15,
"circuitBreakerThreshold": 10,
"maxRetries": 5
}
}
Conservative/budget setup
{
"compression": {
"enabled": true,
"provider": "openrouter",
"model": "meta-llama/llama-3.1-8b-instruct",
"apiKey": "sk-or-v1-...",
"maxConcurrent": 1,
"idleThresholdSeconds": 30,
"maxPerMinute": 5,
"timeoutSeconds": 45
}
}
How compression works
- Idle detection: After
idleThresholdSeconds of inactivity, compression begins
- Batch selection: Old memories are selected for compression
- AI summarization: The model summarizes multiple memories while preserving key information
- Rate limiting: Respects
maxPerMinute and maxConcurrent limits
- Circuit breaker: Stops attempts if failures exceed
circuitBreakerThreshold
- Retry logic: Failed requests are retried up to
maxRetries times
Best practices
- Start with defaults: The default settings work well for most users
- Monitor costs: Track your API usage and adjust
maxPerMinute accordingly
- Use budget models: Models like
gpt-4o-mini or llama-3.1-8b are sufficient for compression
- Adjust idle threshold: Set higher values if compression interferes with your workflow
- Enable circuit breaker: Prevents runaway API costs if errors occur
- Test locally first: Try with
provider: "local" before using paid APIs
Troubleshooting
Compression not running
- Verify
enabled: true
- Check that
apiKey is set correctly
- Ensure system is idle for at least
idleThresholdSeconds
- Check daemon logs:
~/.longmem/logs/
Circuit breaker opened
- Check API provider status
- Verify API key is valid and has quota
- Review
circuitBreakerThreshold and cooldown settings
- Check daemon logs for error details
Slow compression
- Increase
maxConcurrent (respecting rate limits)
- Use a faster model
- Reduce
timeoutSeconds to fail faster
- Consider switching providers