Overview
The CEMS Observer Daemon (cems-observer) is a background process that watches IDE sessions and automatically extracts high-level observations about your workflow. It runs on your local machine and periodically sends session transcripts to the CEMS server for analysis.
Key Features:
- Multi-tool support: Claude Code, Cursor, Codex CLI, Goose
- Incremental learning: observations accumulate during sessions
- Signal-based lifecycle: hooks notify daemon of events (compact, stop)
- Staleness detection: auto-finalizes idle sessions
- Singleton enforcement: only one daemon runs at a time
Architecture
Components
Adapters: Each adapter knows how to discover and extract sessions for a specific tool:| Adapter | Tool | Source | Format |
|---|---|---|---|
ClaudeAdapter | Claude Code | ~/.claude/projects/*/transcript.jsonl | JSONL |
CursorAdapter | Cursor | ~/.cursor/*/transcript.jsonl | JSONL |
CodexAdapter | Codex CLI | ~/.codex/transcripts/*.jsonl | JSONL |
GooseAdapter | Goose | ~/.config/goose/sessions/*.db | SQLite |
~/.cems/observer/{session_id}.json:
~/.cems/observer/signals/{session_id}.json
compact- Finalize current epoch, bump epoch number, continue watchingstop- Finalize current epoch, mark session done
How It Works
1. Discovery
Every 30 seconds, the daemon:- Polls each adapter for active sessions (max_age_hours=2)
- Filters to sessions with recent activity
- Loads or creates state for each session
- Claude/Cursor/Codex: Find JSONL files modified within 2 hours
- Goose: Query SQLite for sessions with recent messages
2. Processing
For each discovered session:3. Growth Detection
Two-Phase Threshold:-
Phase 1: Cheap pre-filter
- Check raw byte delta
- Must exceed
MIN_RAW_DELTA_BYTES = 10,000bytes - Fast file stat, no I/O
-
Phase 2: Real extracted-text gate
- Extract text content from new bytes
- Must exceed
MIN_EXTRACTED_CHARS = 3,000chars (~750 tokens) - Ensures meaningful observations
- Phase 1 filters out tiny changes (typing, cursor movements)
- Phase 2 ensures enough content for LLM analysis
- Avoids wasteful API calls on incremental edits
4. Incremental Observations
When growth threshold is met:- Enrich session metadata (project, git branch, cwd)
- Extract new text content since last observation
- Build project context string
- Send to server via
/api/session/summarize: - Server uses Gemini 2.5 Flash to extract observations
- Update state: bytes, timestamp, observation_count
- “User deploys via Coolify”
- “Project uses PostgreSQL with pgvector”
- “Testing: RSpec for Ruby, pytest for Python”
- “Architecture: microservices with message queue”
- “Workflow: PR review required before merge”
5. Signal Handling
Compact Signal: Sent by hooks during session compaction (memory cleanup):- Finalize current epoch (if observations exist)
- Bump epoch number:
state.epoch += 1 - Continue watching for new activity
- Epoch 0: Initial session observations
- Compact → Finalize epoch 0 doc
- Epoch 1: Post-compact observations
- Compact → Finalize epoch 1 doc
- Etc.
- Finalize current epoch (if observations exist)
- Mark session done:
state.is_done = True - Stop watching this session
- Extract any remaining new content
- Send to server with
mode: "finalize" - Server generates comprehensive final summary
- Update
last_finalized_attimestamp - Clear signal file
6. Staleness Detection
If no file growth forSTALE_THRESHOLD = 300 seconds (5 minutes):
- Check:
time.time() - state.last_growth_seen_at > 300 - Trigger auto-finalization
- Mark session done
- Handles sessions without hooks (e.g., crashed IDE)
- Ensures observations are eventually finalized
- Prevents zombie sessions from clogging the queue
observation_count > 0 (avoids finalizing empty sessions).
Installation & Setup
The observer is automatically installed with CEMS:uv directly:
cems- Main CLIcems-server- Server componentcems-observer- Observer daemon
Running the Daemon
Manual Start
--once- Run one cycle and exit (for testing)--verbose,-v- Enable debug logging
Automatic Start
The observer is typically started automatically by hooks: Claude Code:cems_session_start.py hook
Start Logic:
- Check if observer is already running (via PID file)
- If not, spawn daemon in background
- Write PID to
~/.cems/observer/daemon.pid - Continue hook execution
Singleton Enforcement
Only one observer daemon can run at a time:- Daemon acquires exclusive file lock on
~/.cems/observer/daemon.lock - If lock fails, another daemon is already running → exit
- Lock is held for the lifetime of the process
- On exit, lock is released automatically
Configuration
Credentials
The observer reads credentials from:~/.cems/credentials(preferred)- Environment variables (fallback)
Thresholds
Configured inobserver/daemon.py:
- Lower
MIN_EXTRACTED_CHARS→ more frequent observations (higher API costs) - Higher
MIN_EXTRACTED_CHARS→ fewer observations (may miss details) - Lower
POLL_INTERVAL→ more responsive (higher CPU usage) - Higher
POLL_INTERVAL→ less responsive (lower CPU usage)
Storage
State Directory:~/.cems/observer/
- Old:
~/.claude/observer/ - New:
~/.cems/observer/
Monitoring
Check Daemon Status
Test One Cycle
Run once and exit:Verbose Mode
Enable debug logging:Stop Daemon
Graceful shutdown:State Management
State File Format
~/.cems/observer/{session_id}.json:
session_id- UUID from IDEtool- claude, cursor, codex, or gooseproject_id- org/repo formatsource_ref- “project:org/repo” for memory tagginglast_observed_bytes- File offset for next extractionlast_observed_at- Timestamp of last observationobservation_count- Number of observations sentsession_started- Session start timestampepoch- Epoch number (bumped on compact)last_finalized_at- Timestamp of last finalizationlast_growth_seen_at- Staleness trackingis_done- Session complete flaglast_observed_message_id- SQLite adapter watermark
State Cleanup
Old state files are automatically cleaned up: Schedule: Every 100 poll cycles (~50 minutes) Criteria: Files older than 7 days Manual cleanup:Epoch Model
The observer uses an epoch-based document model: Epoch 0:- Initial session observations
- Document ID:
session:abc123
- Post-compact observations
- Document ID:
session:abc123:e1
- Next epoch observations
- Document ID:
session:abc123:e2
- Allows finalization mid-session (during compacts)
- Keeps observation documents focused and coherent
- Prevents unbounded document growth
- Each epoch gets its own summary and memory tags
session:abc123def456session:abc123def456:e1session:abc123def456:e2
Adapter Details
Claude Code Adapter
Source:~/.claude/projects/*/transcript.jsonl
Discovery:
- Scan
~/.claude/projects/ - Find directories with
transcript.jsonl - Filter by modification time (< 2 hours)
- Read JSONL from last observed byte offset
- Parse each line as JSON
- Extract user messages and assistant responses
- Format as conversation
project_idfrom git remote in project directorygit_branchfrom git statuscwdfrom project directory path
Cursor Adapter
Source:~/.cursor/*/transcript.jsonl
Discovery:
- Scan
~/.cursor/ - Find directories with
transcript.jsonl - Filter by modification time
- Same as Claude Code adapter
Codex Adapter
Source:~/.codex/transcripts/*.jsonl
Discovery:
- Scan
~/.codex/transcripts/ - Find
*.jsonlfiles - Filter by modification time
- Same as Claude Code adapter
Goose Adapter
Source:~/.config/goose/sessions/*.db
Discovery:
- Scan
~/.config/goose/sessions/ - Find
*.dbSQLite files - Query for sessions with recent messages
- Query:
SELECT * FROM messages WHERE id > ? ORDER BY id - Use
last_observed_message_idas watermark - Extract message content
- Format as conversation
last_observed_message_id to avoid re-reading messages:
Error Handling
Consecutive Failures
If the daemon encounters 10 consecutive failures:- Log warning: “N consecutive failures, backing off”
- Switch to backoff interval: 300s (5 minutes)
- Continue polling at reduced rate
- Reset counter on successful cycle
- Server unavailable
- Network issues
- API rate limiting
- Authentication failures
Auth Failures
HTTP 401/403:- Verify credentials in
~/.cems/credentials - Test with
cems health - Restart daemon
Server Unavailable
Symptoms:- Connection timeout errors
- “Connection refused” messages
- Check server status
- Verify
CEMS_API_URLis correct - Check network connectivity
- Daemon will retry automatically
Corrupted State
Symptoms:- JSON decode errors in logs
- Observations re-sent from byte 0
Performance
Resource Usage
CPU: <1% on average (30s polling interval) Memory: ~50MB resident set size Disk I/O:- State reads: ~10-20 per cycle
- State writes: ~2-5 per cycle
- Transcript reads: varies by session activity
- Observation API calls: 1-5 per cycle (when sessions are active)
- Payload size: 3KB - 50KB per observation
Optimization
Two-Phase Threshold: Phase 1 pre-filter eliminates 90%+ of cycles from expensive I/O:- File stat: ~1ms
- File read + extraction: ~10-100ms
Troubleshooting
Observer Not Running
Check:No Observations Being Created
Check threshold:- File growth below threshold (< 10KB raw, < 3K chars extracted)
- Session already marked done
- Signal file stuck (old compact/stop signal)
Observations Re-sent from Start
Cause: Corrupted or missing state file Fix:Duplicate Daemons
Symptom: Multiple observer processes running Fix: The daemon auto-kills stale processes on startup. Restart:Related Pages
- Skills & Commands - Skills and slash commands reference
- CLI Usage - Command-line interface reference
- Memory Management - Memory lifecycle and best practices