Compression configuration

LongMem uses AI-powered compression to keep your memory database lean and relevant. Configure compression providers, models, rate limiting, and circuit breaker settings to optimize performance.

Configuration

Compression settings are configured in the compression section of ~/.longmem/settings.json:

{
  "compression": {
    "enabled": false,
    "provider": "openrouter",
    "model": "meta-llama/llama-3.1-8b-instruct",
    "apiKey": "",
    "baseURL": "https://openrouter.ai/api/v1",
    "maxConcurrent": 1,
    "idleThresholdSeconds": 5,
    "maxPerMinute": 10,
    "timeoutSeconds": 30,
    "circuitBreakerThreshold": 5,
    "circuitBreakerCooldownMs": 60000,
    "circuitBreakerMaxCooldownMs": 300000,
    "maxRetries": 3
  }
}

Configuration fields

`enabled`

Type: boolean
Default: false

Enables AI-powered compression. When enabled, LongMem periodically compresses old memories to reduce database size while preserving semantic meaning. Example:

{
  "compression": {
    "enabled": true
  }
}

You must configure apiKey before enabling compression. The compression feature requires an AI provider to function.

`provider`

Type: string
Default: "openrouter"
Supported providers: openrouter, openai, anthropic, local

The AI provider to use for compression. Each provider has a default base URL that is automatically configured. Provider base URLs:

openrouter: https://openrouter.ai/api/v1
openai: https://api.openai.com/v1
anthropic: https://api.anthropic.com/v1
local: http://localhost:11434/v1

Example:

{
  "compression": {
    "provider": "openai"
  }
}

`model`

Type: string
Default: "meta-llama/llama-3.1-8b-instruct"

The AI model to use for compression. The model name format depends on your provider. Example for OpenRouter:

{
  "compression": {
    "provider": "openrouter",
    "model": "meta-llama/llama-3.1-8b-instruct"
  }
}

Example for OpenAI:

{
  "compression": {
    "provider": "openai",
    "model": "gpt-4o-mini"
  }
}

`apiKey`

Type: string
Default: "" (empty)

API key for authenticating with your AI provider. Required when compression is enabled (except for local providers without authentication). Example:

{
  "compression": {
    "apiKey": "sk-or-v1-..."
  }
}

Never commit your API key to version control. Consider using environment variables or secure credential storage.

`baseURL`

Type: string | undefined
Default: Auto-configured based on provider

Custom base URL for the API endpoint. If not specified, automatically set based on the provider. Example for custom endpoint:

{
  "compression": {
    "provider": "openai",
    "baseURL": "https://api.custom-proxy.com/v1"
  }
}

`maxConcurrent`

Type: number
Default: 1

Maximum number of concurrent compression requests. Increasing this can speed up compression but uses more API quota. Example:

{
  "compression": {
    "maxConcurrent": 3
  }
}

Start with 1 and increase gradually if compression is too slow. Be mindful of your API provider’s rate limits.

`idleThresholdSeconds`

Type: number
Default: 5

Number of seconds the system must be idle before compression starts. Prevents compression from running during active work. Example:

{
  "compression": {
    "idleThresholdSeconds": 10
  }
}

`maxPerMinute`

Type: number
Default: 10

Maximum number of compression requests per minute. Rate limiting to stay within API provider limits. Example:

{
  "compression": {
    "maxPerMinute": 20
  }
}

`timeoutSeconds`

Type: number
Default: 30

Timeout (in seconds) for each compression API request. Requests exceeding this duration are cancelled. Example:

{
  "compression": {
    "timeoutSeconds": 60
  }
}

`circuitBreakerThreshold`

Type: number
Default: 5

Number of consecutive failures before the circuit breaker opens and stops compression attempts. Example:

{
  "compression": {
    "circuitBreakerThreshold": 3
  }
}

`circuitBreakerCooldownMs`

Type: number
Default: 60000 (1 minute)

Initial cooldown period (in milliseconds) after circuit breaker opens. The system waits this long before retrying. Example:

{
  "compression": {
    "circuitBreakerCooldownMs": 120000
  }
}

`circuitBreakerMaxCooldownMs`

Type: number
Default: 300000 (5 minutes)

Maximum cooldown period (in milliseconds). The cooldown period doubles with each consecutive failure but won’t exceed this value. Example:

{
  "compression": {
    "circuitBreakerMaxCooldownMs": 600000
  }
}

`maxRetries`

Type: number
Default: 3

Number of retry attempts for failed compression requests before giving up. Example:

{
  "compression": {
    "maxRetries": 5
  }
}

Provider examples

OpenRouter (default)

{
  "compression": {
    "enabled": true,
    "provider": "openrouter",
    "model": "meta-llama/llama-3.1-8b-instruct",
    "apiKey": "sk-or-v1-..."
  }
}

OpenRouter provides access to many models with competitive pricing. The base URL (https://openrouter.ai/api/v1) is set automatically.

OpenAI

{
  "compression": {
    "enabled": true,
    "provider": "openai",
    "model": "gpt-4o-mini",
    "apiKey": "sk-proj-...",
    "maxConcurrent": 2,
    "maxPerMinute": 50
  }
}

OpenAI provides reliable, high-quality models. Use gpt-4o-mini for cost-effective compression.

Anthropic

{
  "compression": {
    "enabled": true,
    "provider": "anthropic",
    "model": "claude-3-haiku-20240307",
    "apiKey": "sk-ant-...",
    "maxConcurrent": 2
  }
}

Anthropic’s Claude models excel at understanding and summarizing context. Use claude-3-haiku for fast, cost-effective compression.

Local (Ollama)

{
  "compression": {
    "enabled": true,
    "provider": "local",
    "model": "llama3.1:8b",
    "apiKey": "",
    "baseURL": "http://localhost:11434/v1",
    "maxConcurrent": 1,
    "timeoutSeconds": 120
  }
}

Run compression with local models using Ollama. No API key required, and all data stays on your machine.

Local models are great for privacy but may be slower. Adjust timeoutSeconds and maxConcurrent based on your hardware.

Advanced configuration

High-performance setup

{
  "compression": {
    "enabled": true,
    "provider": "openai",
    "model": "gpt-4o-mini",
    "apiKey": "sk-proj-...",
    "maxConcurrent": 5,
    "maxPerMinute": 100,
    "timeoutSeconds": 15,
    "circuitBreakerThreshold": 10,
    "maxRetries": 5
  }
}

Conservative/budget setup

{
  "compression": {
    "enabled": true,
    "provider": "openrouter",
    "model": "meta-llama/llama-3.1-8b-instruct",
    "apiKey": "sk-or-v1-...",
    "maxConcurrent": 1,
    "idleThresholdSeconds": 30,
    "maxPerMinute": 5,
    "timeoutSeconds": 45
  }
}

How compression works

Idle detection: After idleThresholdSeconds of inactivity, compression begins
Batch selection: Old memories are selected for compression
AI summarization: The model summarizes multiple memories while preserving key information
Rate limiting: Respects maxPerMinute and maxConcurrent limits
Circuit breaker: Stops attempts if failures exceed circuitBreakerThreshold
Retry logic: Failed requests are retried up to maxRetries times

Best practices

Start with defaults: The default settings work well for most users
Monitor costs: Track your API usage and adjust maxPerMinute accordingly
Use budget models: Models like gpt-4o-mini or llama-3.1-8b are sufficient for compression
Adjust idle threshold: Set higher values if compression interferes with your workflow
Enable circuit breaker: Prevents runaway API costs if errors occur
Test locally first: Try with provider: "local" before using paid APIs

Troubleshooting

Compression not running

Verify enabled: true
Check that apiKey is set correctly
Ensure system is idle for at least idleThresholdSeconds
Check daemon logs: ~/.longmem/logs/

Circuit breaker opened

Check API provider status
Verify API key is valid and has quota
Review circuitBreakerThreshold and cooldown settings
Check daemon logs for error details

Slow compression

Increase maxConcurrent (respecting rate limits)
Use a faster model
Reduce timeoutSeconds to fail faster
Consider switching providers

Settings overview - Complete configuration guide
How it works - How compression fits into memory lifecycle

Get Started

Core Concepts

CLI Commands

Configuration

Integrations

Guides

Compression configuration

Configuration

Configuration fields

`enabled`

`provider`

`model`

`apiKey`

`baseURL`

`maxConcurrent`

`idleThresholdSeconds`

`maxPerMinute`

`timeoutSeconds`

`circuitBreakerThreshold`

`circuitBreakerCooldownMs`

`circuitBreakerMaxCooldownMs`

`maxRetries`

Provider examples

OpenRouter (default)

OpenAI

Anthropic

Local (Ollama)

Advanced configuration

High-performance setup

Conservative/budget setup

How compression works

Best practices

Troubleshooting

Compression not running

Circuit breaker opened

Slow compression

Build docs developers (and LLMs) love

Get Started

Core Concepts

CLI Commands

Configuration

Integrations

Guides

​Configuration

​Configuration fields

​enabled

​provider

​model

​apiKey

​baseURL

​maxConcurrent

​idleThresholdSeconds

​maxPerMinute

​timeoutSeconds

​circuitBreakerThreshold

​circuitBreakerCooldownMs

​circuitBreakerMaxCooldownMs

​maxRetries

​Provider examples

​OpenRouter (default)

​OpenAI

​Anthropic

​Local (Ollama)

​Advanced configuration

​High-performance setup

​Conservative/budget setup

​How compression works

​Best practices

​Troubleshooting

​Compression not running

​Circuit breaker opened

​Slow compression

​Related

Build docs developers (and LLMs) love

Configuration

Configuration fields

`enabled`

`provider`

`model`

`apiKey`

`baseURL`

`maxConcurrent`

`idleThresholdSeconds`

`maxPerMinute`

`timeoutSeconds`

`circuitBreakerThreshold`

`circuitBreakerCooldownMs`

`circuitBreakerMaxCooldownMs`

`maxRetries`

Provider examples

OpenRouter (default)

OpenAI

Anthropic

Local (Ollama)

Advanced configuration

High-performance setup

Conservative/budget setup

How compression works

Best practices

Troubleshooting

Compression not running

Circuit breaker opened

Slow compression

Related