Architecture overview
LongMem is a local-first persistent memory system for AI coding assistants. It runs as a lightweight daemon that captures your coding activity, stores it in SQLite, and makes it searchable for future sessions.Core components
- HTTP Server (
daemon/server.ts) — REST API for ingestion and retrieval - Privacy Layer — Redacts secrets before storage or compression
- Compression Worker — Optional AI-powered summarization
- Idle Detector — Triggers compression when you stop typing
Data flow
1. Capture phase
When you interact with your AI assistant, LongMem hooks capture:- User prompts — What you asked the AI to do
- Tool calls — Commands executed (file edits, bash, searches)
- Tool outputs — Results from those operations
- File references — Paths mentioned in tool inputs
2. Privacy gate
Before storage, all data passes through privacy filters (see Privacy modes):excludePaths list, only metadata is stored — the content is replaced with [EXCLUDED: path matched denylist].
3. Storage
Observations are written to SQLite (~/.longmem/memory.db):
- sessions table — One row per coding session
- observations table — Tool executions with redacted input/output
- concepts table — Extracted tags for semantic search
- compression_jobs table — Queue for AI summaries
4. Compression (optional)
If enabled, the compression worker:- Waits for idle time (default: 5 seconds)
- Fetches pending observations from the queue
- Re-redacts data before sending to the LLM (egress gate)
- Generates a concise summary using AI
- Stores summary back in the
observationstable
Idle detection
The daemon uses an idle detector to trigger compression only when you stop working:- Every API call resets the idle timer
- After
idleThresholdSeconds(default: 5s), compression starts - This prevents blocking your workflow during active coding
Auto-context injection
When you start a new session or change topics, LongMem can automatically inject relevant context from past work:API endpoints
The daemon exposes a REST API onlocalhost:38741 (configurable):
| Endpoint | Method | Purpose |
|---|---|---|
/observe | POST | Ingest a tool observation |
/prompt | POST | Record a user prompt (with optional context) |
/search | GET | Search observations by query |
/context | GET | Get formatted context block |
/status | GET | Daemon health + compression stats |
/export | GET | Export memory as JSON or Markdown |
Security model
- Local-only by default — No data leaves your machine unless compression is enabled
- Optional auth token — Set
daemon.authTokento require Bearer authentication - Privacy-first — Secrets are redacted before storage AND before compression
- Path-based exclusion — Never store content from
.env,*.key,credentials.json, etc.
LongMem never sends data to the cloud unless you explicitly enable compression with a remote provider. Even then, data is re-redacted before egress.
Performance characteristics
- Startup time: <100ms
- Memory footprint: ~20-40 MB (idle)
- Disk usage: ~1-5 MB per day of active coding
- Query latency: <50ms for FTS search, <10ms for recency
What gets compressed?
Compression is metadata-only. The raw tool inputs/outputs remain in the database for export and debugging. Summaries are stored incompressed_summary and used for search ranking.
Example compression output:
Next steps
Privacy modes
Configure secret redaction and data privacy
Compression
Enable AI-powered memory summaries