Skip to main content

Overview

Compression is an optional feature that uses an LLM to generate concise summaries of tool observations. These summaries improve search relevance and help your AI assistant recall past work more effectively.
LongMem works fully without compression — you just won’t get AI-powered summaries. Search will fall back to raw tool outputs.

Why compression?

Without compression:
  • Raw tool inputs/outputs are stored verbatim
  • Search uses full-text matching on potentially noisy data
  • Large observations (e.g. file diffs) are hard to summarize
With compression:
  • LLM extracts essential information (what changed, why)
  • Search ranks by semantic relevance
  • Concepts/tags enable topic-based retrieval
  • Context injection is more precise

Example compression

Input (raw tool output):
$ git diff src/server.ts
- app.get('/api/users', async (req, res) => {
+ app.get('/api/users', rateLimit({ max: 100 }), async (req, res) => {
    const users = await db.query('SELECT * FROM users');
    res.json(users);
  });
Output (compressed summary):
{
  "summary": "Added rate limiting middleware to /api/users endpoint",
  "type": "feature",
  "files": ["src/server.ts"],
  "concepts": ["rate-limiting", "api", "middleware"]
}

Supported providers

LongMem uses the OpenAI-compatible API format, supporting:
// daemon/config.ts:49-54
const PROVIDERS: Record<string, string> = {
  openrouter: "https://openrouter.ai/api/v1",
  openai: "https://api.openai.com/v1",
  anthropic: "https://api.anthropic.com/v1",
  local: "http://localhost:11434/v1",
};

Provider comparison

ProviderCostSpeedPrivacyBest for
OpenRouter$0.10/1M tokensFastData sent to cloudCheap, reliable
OpenAI$0.15/1M tokensFastData sent to cloudHigh quality
Anthropic$0.25/1M tokensFastData sent to cloudBest summaries
Local (Ollama)FreeSlow100% localMax privacy
If you enable compression with a cloud provider, tool outputs are sent to that LLM. Data is redacted twice (ingress + egress), but if you need absolute privacy, use local or disable compression.

Configuration

{
  "compression": {
    "enabled": true,
    "provider": "openrouter",
    "model": "meta-llama/llama-3.1-8b-instruct",
    "apiKey": "sk-or-v1-..."
  }
}
Get an API key: openrouter.ai/keys Recommended models:
  • meta-llama/llama-3.1-8b-instruct (cheap, fast)
  • openai/gpt-4o-mini (better summaries)
  • anthropic/claude-3-haiku (best quality)

OpenAI

{
  "compression": {
    "enabled": true,
    "provider": "openai",
    "model": "gpt-4o-mini",
    "apiKey": "sk-..."
  }
}

Anthropic

{
  "compression": {
    "enabled": true,
    "provider": "anthropic",
    "model": "claude-3-haiku-20240307",
    "apiKey": "sk-ant-..."
  }
}

Local (Ollama)

Zero cost, full privacy, slower.
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama3.1:8b
{
  "compression": {
    "enabled": true,
    "provider": "local",
    "model": "llama3.1:8b",
    "baseURL": "http://localhost:11434/v1"
  }
}
Local compression requires ~8 GB RAM and adds 2-5 seconds per observation. Use for small projects or when privacy is critical.

Advanced settings

Rate limiting

Prevent API quota exhaustion:
{
  "compression": {
    "maxPerMinute": 10,
    "maxConcurrent": 1
  }
}
maxPerMinute — Requests per minute (default: 10) maxConcurrent — Parallel compression jobs (default: 1)

Circuit breaker

Automatic backoff when the API is down:
{
  "compression": {
    "circuitBreakerThreshold": 5,
    "circuitBreakerCooldownMs": 60000,
    "circuitBreakerMaxCooldownMs": 300000
  }
}
How it works:
// daemon/compression-worker.ts:172-186
private openCircuit(): void {
  this.circuitOpen = true;
  this.circuitOpenCount++;
  if (this.circuitTimer) clearTimeout(this.circuitTimer);
  
  const baseDelay = this.config.circuitBreakerCooldownMs;
  const maxDelay = this.config.circuitBreakerMaxCooldownMs || 300000;
  const delay = Math.min(baseDelay * Math.pow(2, this.circuitOpenCount - 1), maxDelay);
  
  this.circuitTimer = setTimeout(() => {
    this.circuitOpen = false;
    this.consecutiveFailures = 0;
    this.processQueue();
  }, delay);
}
After 5 consecutive failures:
  1. Circuit opens (compression pauses)
  2. Waits 60 seconds
  3. Retries (exponential backoff: 60s → 120s → 240s)
  4. Max cooldown: 5 minutes

Retry logic

{
  "compression": {
    "maxRetries": 3,
    "timeoutSeconds": 30
  }
}
maxRetries — Attempts before marking job as failed (default: 3) timeoutSeconds — HTTP timeout per request (default: 30s) Error handling:
// daemon/compression-worker.ts:139-156
private handleError(job: { id: number; attempts: number }, error: unknown): void {
  const errorMsg = error instanceof Error ? error.message : String(error);
  const status = (error as any)?.status;

  if (status === 401 || status === 403) {
    // Auth error — don't retry
    updateCompressionJob(job.id, "failed", `Auth error: ${errorMsg}`);
    this.consecutiveFailures += this.config.circuitBreakerThreshold; // Trip immediately
  } else if (job.attempts >= this.config.maxRetries) {
    updateCompressionJob(job.id, "failed", `Max retries: ${errorMsg}`);
  } else {
    updateCompressionJob(job.id, "pending", errorMsg);
  }

  this.consecutiveFailures++;
  if (this.consecutiveFailures >= this.config.circuitBreakerThreshold) {
    this.openCircuit();
  }
}
  • 401/403 errors → Circuit trips immediately (bad API key)
  • Timeout/network errors → Retry up to maxRetries
  • Max retries exceeded → Job marked as failed

Idle detection

Compression only runs when you stop typing:
{
  "compression": {
    "idleThresholdSeconds": 5
  }
}
Every API call resets the idle timer. After 5 seconds of inactivity, the compression worker processes the queue.
// daemon/server.ts:131-134
const idleDetector = new IdleDetector(
  config.compression.idleThresholdSeconds * 1000,
  () => worker.processQueue()
);
This prevents blocking your workflow during active coding sessions.

Compression prompt

The LLM receives this system prompt:
// daemon/compression-sdk.ts:4-12
const COMPRESS_PROMPT = `You are a memory compression engine. Analyze tool usage and extract essential, reusable knowledge.

Given tool execution data, output a JSON object with:
- summary: One clear sentence about what happened (max 100 chars)
- type: One of: decision, bugfix, feature, refactor, discovery, pattern, change, note
- files: Array of file paths referenced (max 5)
- concepts: Array of key concepts/tags (max 5)

Be extremely concise. Focus on WHAT was learned, not HOW.`;
User message format:
Tool: Edit
Input: {"file_path": "src/server.ts", "oldString": "...", "newString": "..."}
Output: File updated successfully. Changes:
- Added rate limiting middleware
- Imported express-rate-limit
LLM response:
{
  "summary": "Added rate limiting to /api/users endpoint",
  "type": "feature",
  "files": ["src/server.ts"],
  "concepts": ["rate-limiting", "middleware", "api"]
}

Privacy guarantees

Double redaction

Data is sanitized twice:
  1. Ingress gate (storage) — daemon/routes.ts:98-105
  2. Egress gate (compression) — daemon/compression-worker.ts:73-82
Both passes apply:
  • Secret pattern matching
  • Custom pattern redaction
  • High-risk pattern detection

Kill switch

If a high-risk pattern survives redaction, the job is quarantined:
// daemon/compression-worker.ts:85-88
if (this.privacyMode !== "none" && containsHighRiskPattern(egressOutput)) {
  updateCompressionJob(job.id, "quarantined", "high_risk_pattern_detected_post_redaction");
  continue;
}
Quarantined jobs are logged but never sent to the LLM.

Path exclusion

Files matching excludePaths skip compression entirely:
// daemon/compression-worker.ts:95-98
if (egressOutput === "[EXCLUDED: path matched denylist]") {
  updateCompressionJob(job.id, "skipped", "path_excluded");
  continue;
}

Monitoring compression

Check compression status:
longmem status
Output:
longmem daemon: running
  PID:        12345
  Port:       38741
  Uptime:     3600s
  Pending:    0 compression jobs
  Circuit:    closed
  Idle:       15s
Pending — Number of observations waiting for compression CircuitOPEN (paused) if circuit breaker tripped, closed otherwise Idle — Seconds since last activity

Debug compression failures

longmem logs -n 100 | grep -i compression
Common errors:
  • Auth error: 401 → Invalid API key
  • Compression timeout → Increase timeoutSeconds
  • Max retries → Network issues or rate limits
  • quarantined → High-risk pattern detected (check privacy settings)

Cost estimation

Typical usage

  • Observations per day: 50-200
  • Tokens per compression: 200-500
  • Total tokens per day: 10K-100K

Provider costs

ProviderModelCost/day (100 obs)Cost/month
OpenRouterllama-3.1-8b$0.01$0.30
OpenRoutergpt-4o-mini$0.02$0.60
OpenAIgpt-4o-mini$0.03$0.90
Anthropicclaude-3-haiku$0.05$1.50
Localllama3.1:8bFreeFree
For most users, compression costs <$1/month with OpenRouter.

Disabling compression

To disable completely:
{
  "compression": {
    "enabled": false
  }
}
Or just pause it:
# The circuit breaker auto-opens on errors
longmem status  # Check if circuit is open
Compression will resume automatically after the cooldown period.

Example configurations

Minimal (free)

{
  "compression": {
    "enabled": false
  }
}

Budget-conscious

{
  "compression": {
    "enabled": true,
    "provider": "openrouter",
    "model": "meta-llama/llama-3.1-8b-instruct",
    "apiKey": "sk-or-v1-...",
    "maxPerMinute": 5,
    "idleThresholdSeconds": 10
  }
}

High-volume

{
  "compression": {
    "enabled": true,
    "provider": "openrouter",
    "model": "openai/gpt-4o-mini",
    "apiKey": "sk-or-v1-...",
    "maxConcurrent": 3,
    "maxPerMinute": 30,
    "idleThresholdSeconds": 3
  }
}

Privacy-first

{
  "compression": {
    "enabled": true,
    "provider": "local",
    "model": "llama3.1:8b",
    "baseURL": "http://localhost:11434/v1",
    "maxConcurrent": 1,
    "timeoutSeconds": 60
  }
}

Enterprise (Anthropic)

{
  "compression": {
    "enabled": true,
    "provider": "anthropic",
    "model": "claude-3-haiku-20240307",
    "apiKey": "sk-ant-...",
    "maxConcurrent": 2,
    "maxPerMinute": 20,
    "circuitBreakerThreshold": 3,
    "maxRetries": 5
  }
}

Troubleshooting

Compression not starting

  1. Check if enabled: cat ~/.longmem/settings.json | grep enabled
  2. Verify API key is set
  3. Check idle threshold: longmem status → Idle should be >5s
  4. Look for errors: longmem logs -n 50

Circuit breaker stuck open

# Check status
longmem status

# View recent errors
longmem logs -n 100 | grep -i circuit

# Common fixes:
# 1. Invalid API key → Update settings.json
# 2. Rate limit → Reduce maxPerMinute
# 3. Provider outage → Wait for cooldown (auto-retries)

Summaries are low quality

Try a better model:
  • OpenRouter: Switch to openai/gpt-4o-mini
  • Anthropic: Use claude-3-haiku
  • Local: Pull a larger model (e.g., llama3.1:70b)

High API costs

{
  "compression": {
    "maxPerMinute": 3,
    "idleThresholdSeconds": 15
  }
}
Or switch to a cheaper model (llama-3.1-8b is 5x cheaper than GPT-4o-mini).

Next steps

Privacy modes

Configure secret redaction

Configuration

Full settings.json reference

Build docs developers (and LLMs) love