Overview
TheMemory class manages conversation history and ensures messages fit within the LLM’s context window:
Basic Usage
Create memory with default settings:Memory Adapters
Memory supports different message formats through adapters:LlamaIndex Format (Default)
Vercel AI SDK Format
Custom Adapters
Context Window Management
Memory automatically manages token limits:Dynamic Token Limits
Token limits adapt to the LLM’s context window:Memory Blocks
Memory blocks provide specialized long-term memory storage:Vector Memory Block
Stores conversations in a vector store for semantic retrieval:Fact Extraction Memory Block
Extracts and stores key facts from conversations:Static Memory Block
Provides fixed context (system prompts, instructions):Custom Memory Blocks
Implement custom memory logic:Memory Priority System
Memory blocks are included based on priority:Transient Messages
Include temporary messages without adding them to history:Memory Snapshots
Save and restore memory state:Using with Chat Engines
Integrate memory with chat engines:Clearing Memory
Reset conversation history:Best Practices
Token Management:- Set
tokenLimitto ~70% of your LLM’s context window - Adjust
shortTermTokenLimitRatiobased on your use case - Monitor token usage to avoid context overflow
- Use priority=0 for fixed content (system prompts)
- Use vector memory for long conversations
- Use fact extraction for persistent user information
- Limit the number of memory blocks (3-5 max)
- Memory blocks are processed on every
add()when short-term limit is exceeded - Use
isLongTerm: truefor blocks that should store historical messages - Cache memory snapshots to avoid reprocessing
- Use unique IDs for memory blocks per user/session
- Filter vector memories by session ID
- Clear memory between unrelated conversations
Next Steps
Chat Engines
Build conversational interfaces with memory
Evaluation
Measure the quality of your RAG responses