Skip to main content

Overview

Memory management enables LLM applications to maintain context across conversations, remember user preferences, and provide personalized experiences. This guide covers implementation patterns using Mem0 with Qdrant vector store.

Core Concepts

Persistent Memory

Store and retrieve conversation history across sessions using vector databases

User-Specific Context

Maintain separate memory spaces for each user with personalized preferences

Memory Retrieval

Semantic search through past interactions to provide relevant context

Multi-LLM Support

Share memory across different language models (GPT-4, Claude, Llama)

Memory Architecture

Configuration with Mem0 and Qdrant

import os
from mem0 import Memory
from openai import OpenAI

# Initialize Mem0 with Qdrant
config = {
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "model": "gpt-4o-mini",
            "host": "localhost",
            "port": 6333,
        }
    },
}

memory = Memory.from_config(config)
openai_client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

Setup Qdrant Vector Database

1

Pull Qdrant Docker Image

docker pull qdrant/qdrant
2

Run Qdrant Container

docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage:z \
    qdrant/qdrant
3

Verify Connection

Access the Qdrant dashboard at http://localhost:6333/dashboard

Implementation Patterns

Pattern 1: AI Research Agent with Memory

import streamlit as st
import os
from mem0 import Memory
from multion.client import MultiOn
from openai import OpenAI

st.title("AI Research Agent with Memory 📚")

# Initialize components
config = {
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "model": "gpt-4o-mini",
            "host": "localhost",
            "port": 6333,
        }
    },
}

memory = Memory.from_config(config)
multion = MultiOn(api_key=api_keys['multion'])
openai_client = OpenAI(api_key=api_keys['openai'])

user_id = st.sidebar.text_input("Enter your Username")
search_query = st.text_input("Research paper search query")

if st.button('Search for Papers'):
    with st.spinner('Searching and Processing...'):
        # Retrieve relevant memories for context
        relevant_memories = memory.search(search_query, user_id=user_id, limit=3)
        
        # Build context-aware prompt
        prompt = f"Search for arXiv papers: {search_query}\n"
        prompt += f"User background: {' '.join(mem['text'] for mem in relevant_memories)}"
        
        # Execute search with context
        result = multion.browse(cmd=prompt, url="https://arxiv.org/")
        st.markdown(result)
Key Features:
  • Maintains user research interests across sessions
  • Contextualizes searches based on past queries
  • Personalizes results using memory retrieval

Pattern 2: Local ChatGPT with Personal Memory

import streamlit as st
from mem0 import Memory
from litellm import completion

# User-specific session management
if "messages" not in st.session_state:
    st.session_state.messages = []
if "previous_user_id" not in st.session_state:
    st.session_state.previous_user_id = None

user_id = st.text_input("Enter your Username")

# Clear history on user switch
if user_id != st.session_state.previous_user_id:
    st.session_state.messages = []
    st.session_state.previous_user_id = user_id

if prompt := st.chat_input("What is your message?"):
    # Add to chat history
    st.session_state.messages.append({"role": "user", "content": prompt})
    
    # Store in memory
    m.add(prompt, user_id=user_id)
    
    # Retrieve context from memory
    memories = m.get_all(user_id=user_id)
    context = ""
    if memories and "results" in memories:
        for memory in memories["results"]:
            if "memory" in memory:
                context += f"- {memory['memory']}\n"
    
    # Generate response with context
    response = completion(
        model="ollama/llama3.1:latest",
        messages=[
            {"role": "system", "content": "You are a helpful assistant with access to past conversations."},
            {"role": "user", "content": f"Context: {context}\nCurrent message: {prompt}"}
        ],
        api_base="http://localhost:11434",
        stream=True
    )
    
    # Store response in memory
    m.add(f"Assistant: {full_response}", user_id=user_id)
Key Features:
  • Fully local implementation (no external APIs)
  • Per-user memory isolation
  • Streaming responses with context

Pattern 3: Multi-LLM with Shared Memory

Memory Operations

Adding Memories

memory.add()
function
# Add user message to memory
memory.add(
    "I'm interested in machine learning papers on transformers",
    user_id="user123",
    metadata={"timestamp": datetime.now(), "category": "preference"}
)

Retrieving Memories

# Semantic search through memories
relevant_memories = memory.search(
    "papers about attention mechanisms",
    user_id="user123",
    limit=3
)

for mem in relevant_memories:
    print(f"- {mem['text']} (relevance: {mem['score']})")

Viewing All Memories

# Get complete memory history
all_memories = memory.get_all(user_id="user123")

if "results" in all_memories:
    for memory in all_memories["results"]:
        print(f"- {memory['memory']}")

Best Practices

Store atomic pieces of information:
  • ✅ “User prefers Python over JavaScript”
  • ✅ “Interested in computer vision research”
  • ❌ “User had a long conversation about many topics”
Smaller, focused memories enable better retrieval and context building.
Optimize context usage:
# Retrieve only relevant memories
memories = memory.search(current_query, user_id=user_id, limit=3)

# Build concise context
context = "\n".join([m['text'] for m in memories[:3]])

# Don't exceed token limits
if len(context) > 1000:  # tokens
    context = context[:1000]
User data controls:
# Delete user memories
memory.delete_all(user_id="user123")

# Export user data
user_data = memory.get_all(user_id="user123")
with open(f"{user_id}_memories.json", "w") as f:
    json.dump(user_data, f)
Optimize vector operations:
  • Use appropriate embedding dimensions (768 for nomic-embed-text)
  • Implement pagination for large memory sets
  • Cache frequently accessed memories
  • Use batch operations when possible
# Batch add memories
memories_to_add = [
    {"text": "Memory 1", "user_id": "user123"},
    {"text": "Memory 2", "user_id": "user123"},
]

for mem in memories_to_add:
    memory.add(mem["text"], user_id=mem["user_id"])

Use Cases

Research Assistants

Remember research interests, past queries, and preferred topics

Travel Agents

Maintain travel preferences, budget constraints, and destinations

Personalized Chatbots

Build rapport through conversation history and user preferences

Learning Assistants

Track learning progress, knowledge gaps, and study patterns

Advanced Patterns

Stateful Multi-Turn Conversations

class StatefulAgent:
    def __init__(self, user_id, memory_config):
        self.user_id = user_id
        self.memory = Memory.from_config(memory_config)
        self.conversation_history = []
    
    def chat(self, message):
        # Add user message
        self.memory.add(message, user_id=self.user_id)
        
        # Retrieve relevant context
        context = self.memory.search(message, user_id=self.user_id, limit=5)
        
        # Generate response with full context
        response = llm.generate(
            prompt=message,
            context=context,
            history=self.conversation_history
        )
        
        # Store in memory and history
        self.memory.add(f"Assistant: {response}", user_id=self.user_id)
        self.conversation_history.append(
            {"user": message, "assistant": response}
        )
        
        return response

Memory-Enhanced RAG

def rag_with_memory(query, documents, user_id):
    # Retrieve user preferences and past interactions
    user_context = memory.search(query, user_id=user_id, limit=3)
    
    # Standard RAG retrieval
    relevant_docs = vector_store.similarity_search(query, k=5)
    
    # Combine with memory context
    enhanced_prompt = f"""
    User context: {user_context}
    Relevant documents: {relevant_docs}
    Query: {query}
    
    Provide a personalized response based on user preferences and documents.
    """
    
    return llm.generate(enhanced_prompt)

Resources

Mem0 Documentation

Official Mem0 memory framework docs

Qdrant Guides

Vector database setup and optimization

Example Apps

Complete implementations with memory

Tutorial

Step-by-step memory tutorial

Build docs developers (and LLMs) love