Overview
The memory system provides:Fact Extraction
Automatically extracts key facts from conversations using LLM analysis
Persistent Storage
Stores facts in JSON format with confidence scores and timestamps
Context Injection
Intelligently injects relevant facts into agent system prompts
Debounced Updates
Batches updates to reduce LLM calls and improve performance
Configuration
Memory is configured in thememory section of config.yaml:
config.yaml
Configuration Options
Whether to enable the memory system globally.Set to
false to disable memory extraction and injection.Path to store memory data.Path Resolution:
- Empty string (
"") →{DEER_FLOW_HOME}/memory.json(default) - Relative path →
{DEER_FLOW_HOME}/{storage_path} - Absolute path → Used as-is
DEER_FLOW_HOME is:DEER_FLOW_HOMEenvironment variable, or.deer-flow/in backend directory (dev mode), or~/.deer-flow/(default)
Migration Note: If you previously set
storage_path: .deer-flow/memory.json, it will now resolve to {DEER_FLOW_HOME}/.deer-flow/memory.json. Use an absolute path to preserve the old location.Seconds to wait before processing queued memory updates.How it works:
- Memory updates are queued during conversation
- After
debounce_secondsof inactivity, updates are batched and processed - Reduces LLM calls and API costs
- Lower values (10-30s) → More frequent updates, higher costs
- Higher values (60-300s) → Less frequent updates, lower costs
Model to use for memory extraction and updates.
null→ Uses the default model (first inmodelslist)- Specify model name → Uses that configured model
gpt-4o-mini for memory operations.Maximum number of facts to store in memory.When the limit is reached:
- Oldest facts (by timestamp) are removed first
- Or lowest confidence facts if timestamps are equal
Minimum confidence score (0.0-1.0) required to store a fact.Facts with confidence below this threshold are discarded.Tuning:
- Higher values (0.8-1.0) → Only high-confidence facts stored
- Lower values (0.5-0.7) → More facts stored, potentially less accurate
Whether to inject memory facts into agent system prompts.Set to
false to store facts without injecting them (passive mode).Maximum tokens to use for memory injection in system prompts.Facts are prioritized by confidence and recency, then truncated to fit this limit.Tuning:
- Lower values (500-1000) → Only highest priority facts injected
- Higher values (2000-4000) → More comprehensive context
Storage Format
Memory is stored as JSON with the following structure:memory.json
Fact Fields
- content: The extracted fact as a natural language statement
- confidence: Confidence score (0.0-1.0) assigned by the LLM
- timestamp: ISO 8601 timestamp when the fact was extracted
- source: Source of the fact (typically
"conversation")
How Memory Works
Conversation Analysis
As the user interacts with the agent, conversation messages are analyzed for extractable facts.
Fact Extraction
The memory system uses an LLM to extract key facts:
- User preferences and habits
- Project information
- Technical context
- Personal details (when relevant)
Confidence Scoring
Each extracted fact is assigned a confidence score:
- 0.9-1.0: Explicit statements (“I prefer X”)
- 0.7-0.9: Strong inference (“I always use X”)
- 0.5-0.7: Weak inference (“I might use X”)
- Below 0.5: Discarded (below threshold)
Debounced Storage
Facts are queued and stored after
debounce_seconds of inactivity to batch updates.Fact Pruning
If
max_facts is exceeded:- Sort facts by timestamp (oldest first)
- Remove oldest facts until within limit
- Optionally consider confidence scores
Memory Injection Format
When memory is injected into the agent’s system prompt:The exact injection format is determined by the agent’s prompt template. The above is an example.
Configuration Examples
Minimal Memory (Cost-Optimized)
For minimal API usage:Comprehensive Memory
For maximum context retention:Memory Without Injection
Store facts but don’t inject them (for analysis only):Custom Storage Location
Per-User Memory
For multi-tenant setups, use environment variables:Programmatic Access
Access memory configuration in Python:Update Configuration at Runtime
Best Practices
Use a Lightweight Model
Use a Lightweight Model
Memory operations don’t need powerful models. Use a cost-effective model:
Tune Debounce for Your Use Case
Tune Debounce for Your Use Case
- Interactive applications: 30-60 seconds
- Long-running tasks: 120-300 seconds
- Cost-sensitive: Higher values
Set Appropriate Fact Limits
Set Appropriate Fact Limits
- Personal assistant: 100-200 facts
- Project-specific agent: 200-500 facts
- Multi-user system: Separate memory files per user
Monitor Storage Size
Monitor Storage Size
Regularly check memory file size:If too large, reduce
max_facts or increase fact_confidence_threshold.Backup Memory Data
Backup Memory Data
Memory files contain valuable context. Back them up regularly:
Memory Lifecycle
Troubleshooting
Memory not persisting
Memory not persisting
Check storage path and permissions:
Too many/few facts extracted
Too many/few facts extracted
Adjust
fact_confidence_threshold:Memory updates too frequent/infrequent
Memory updates too frequent/infrequent
Tune
debounce_seconds:Context injection too large
Context injection too large
Reduce
max_injection_tokens:Next Steps
Environment Variables
Configure environment variables
Agent Customization
Customize agent behavior