Maintenance Jobs
CEMS has four types of scheduled maintenance:Consolidation
Nightly at 3 AM - Merges semantic duplicates
Reflection
Nightly at 3:30 AM - Condenses observations per project
Summarization
Weekly Sunday 4 AM - Compresses old memories
Re-indexing
Monthly 1st at 5 AM - Rebuilds embeddings
Consolidation Job
Schedule: Nightly at 3 AMPurpose: Find and merge semantically duplicate memories using a three-tier approach.
How It Works
Consolidation uses a tiered deduplication system:- Tier 1 (>= 0.98 similarity): Auto-merge near-identical memories without LLM
- Tier 2 (0.80-0.98 similarity): LLM classifies as duplicate/related/conflicting/distinct
- Tier 3 (< 0.80 similarity): Skip - too different
What Gets Merged
The consolidation job:- Processes last 7 days of memories by default (5000 document limit)
- Pre-embeds all documents in batches to avoid N API round-trips
- Uses vector search to find similar chunks
- Merges duplicates using LLM content synthesis
- Detects and logs conflicts between contradictory memories
Manual Trigger
Observation Reflection
Schedule: Nightly at 3:30 AM (after consolidation)Purpose: Consolidate overlapping observations per project. Inspired by Mastra’s Reflector Agent, this job condenses redundant observations:
How It Works
- Fetch all observations for each project (category=“observation”)
- Group by source_ref (project identifier)
- Skip if < 10 observations - not worth consolidating yet
- Send to LLM for re-synthesis into condensed set
- Replace originals - store consolidated observations, soft-delete originals
Safety Guards
- Sanity check: Don’t replace if LLM produces more observations than original
- Atomic replacement: Only delete originals if ALL consolidated observations stored successfully
- Fallback: Keep originals on any error to prevent data loss
Summarization Job
Schedule: Weekly on Sunday at 4 AMPurpose: Compress old memories and prune stale ones.
Two-Phase Process
Phase 1: Compress by Category- Find memories 30+ days old
- Group by category
- Generate LLM summary for categories with 3+ old memories
- Store as new document with
category-summarytag
- Soft-delete documents not updated in 90+ days (configurable via
stale_days) - Preserves data via soft-delete (can be restored)
Re-indexing Job
Schedule: Monthly on 1st at 5 AMPurpose: Rebuild embeddings with latest model and archive dead memories.
Two-Phase Process
Phase 1: Refresh Embeddings- Fetch all documents (5000 limit)
- Re-embed each with current embedding model
- Replaces chunks in database with fresh embeddings
- Progress logged every 10 documents
- Soft-delete documents not updated in 180+ days (configurable via
archive_days) - Preserves data via soft-delete
Configuration
Schedule configuration via environment variables:Deduplication Thresholds
Manual Triggers
Run maintenance jobs on-demand:CLI
API
MCP Tool
Monitoring
Check scheduler status:Best Practices
When should I run manual maintenance?
When should I run manual maintenance?
Run manual maintenance when:
- You’ve imported a large batch of memories
- You notice duplicate memories in search results
- After changing embedding models (run re-indexing)
- When testing deduplication thresholds
Will maintenance delete my data?
Will maintenance delete my data?
No. All maintenance jobs use soft-delete by default:
- Memories are marked as deleted, not removed from database
- You can restore soft-deleted memories if needed
- Hard delete requires explicit
--hardflag
How long does maintenance take?
How long does maintenance take?
Depends on memory count:
- Consolidation: ~1-2 min for 1000 memories
- Reflection: ~30 sec per project with 10+ observations
- Summarization: ~1-2 min for 500 old memories
- Re-indexing: ~5-10 min for 5000 memories (embedding API calls)
Can I disable automatic maintenance?
Can I disable automatic maintenance?
Yes, but not recommended. To disable:You’ll need to run maintenance manually via CLI/API.
What happens if maintenance fails?
What happens if maintenance fails?
- Errors are logged but don’t crash the scheduler
- Partial failures are safe (atomic operations)
- Failed jobs will retry on next scheduled run
- Check logs in Docker:
docker compose logs cems-server
Next Steps
Retrieval Tuning
Optimize search parameters and modes
Troubleshooting
Debug common maintenance issues