Memory System Architecture - Tabby AI Keyboard

Overview

Tabby’s memory system is what makes it truly intelligent. Instead of starting from scratch every conversation, it remembers your preferences, coding style, past interactions, and context to provide increasingly personalized assistance. The memory layer is powered by Mem0 with a hybrid storage architecture:

Mem0

Intelligent memory extraction and retrieval

Supabase

Vector embeddings for semantic search

Neo4j

Knowledge graph for relationships (optional)

What is Mem0?

Mem0 is an intelligent memory layer for AI applications that automatically:

Extracts facts from conversations (“User prefers dark mode”)
Stores embeddings for semantic search
Retrieves relevant memories based on context
Updates existing memories instead of duplicating
Tracks memory history and changes over time

Think of Mem0 as giving your AI a long-term memory system, similar to how humans remember context from past conversations.

Memory Lifecycle

Architecture Components

1. Memory Backend (FastAPI)

The memory service runs as a separate FastAPI server on port 8000:

# backend/main.py
from mem0 import Memory
from fastapi import FastAPI

app = FastAPI(title="Memory API")

config = {
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4.1-nano-2025-04-14",
            "enable_vision": True,  # For image memories
        }
    },
    "vector_store": {
        "provider": "supabase",
        "config": {
            "connection_string": supabase_connection_string,
            "collection_name": "memories",
            "index_method": "hnsw",
            "index_measure": "cosine_distance"
        }
    },
    "graph_store": {
        "provider": "neo4j",
        "config": {
            "url": neo4j_url,
            "username": neo4j_username,
            "password": neo4j_password,
        }
    }
}

memory = Memory.from_config(config)

Why Separate Service?

Language Compatibility

Mem0 is a Python library. The FastAPI service acts as a bridge to the JavaScript/TypeScript frontend.

Scalability

Memory operations can be resource-intensive. Running as a separate service allows independent scaling.

Isolation

Crashes in the memory service don’t affect the main Electron app.

2. Supabase Vector Store

Supabase stores vector embeddings of memories for semantic search. Database Schema:

CREATE TABLE memories (
  id UUID PRIMARY KEY,
  user_id TEXT NOT NULL,
  memory TEXT NOT NULL,
  hash TEXT UNIQUE,
  metadata JSONB,
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW(),
  embedding VECTOR(1536)  -- OpenAI ada-002 dimensions
);

-- HNSW index for fast vector search
CREATE INDEX ON memories 
USING hnsw (embedding vector_cosine_ops);

Vector Search:

-- Find similar memories using cosine similarity
CREATE FUNCTION match_memories(
  query_embedding VECTOR(1536),
  match_threshold FLOAT,
  match_count INT,
  filter_user_id TEXT
)
RETURNS TABLE (
  id UUID,
  memory TEXT,
  similarity FLOAT
)
AS $$
  SELECT id, memory, 1 - (embedding <=> query_embedding) AS similarity
  FROM memories
  WHERE user_id = filter_user_id
    AND 1 - (embedding <=> query_embedding) > match_threshold
  ORDER BY similarity DESC
  LIMIT match_count;
$$ LANGUAGE SQL;

Why HNSW Index?

HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search. It’s much faster than brute-force vector search with minimal accuracy loss.

Speed: Sub-millisecond queries even with 100k+ memories
Accuracy: 95%+ recall compared to exact search
Scalability: Handles millions of vectors efficiently

3. Neo4j Knowledge Graph (Optional)

Neo4j creates a visual knowledge graph showing relationships between memories. Graph Schema:

// Node: Memory
CREATE (m:Memory {
  id: "uuid",
  content: "User prefers TypeScript",
  type: "LONG_TERM",
  created_at: datetime()
})

// Node: Entity (extracted from memories)
CREATE (e:Entity {
  name: "TypeScript",
  type: "TECHNOLOGY"
})

// Relationship
CREATE (m)-[:MENTIONS]->(e)

Example Graph: Visualization in Brain Panel:

import NVL from '@neo4j-nvl/react';

const MemoryGraph = ({ userId }: { userId: string }) => {
  const [graphData, setGraphData] = useState({ nodes: [], rels: [] });
  
  // Fetch Neo4j data
  useEffect(() => {
    fetch(`/api/memory/graph?userId=${userId}`)
      .then(res => res.json())
      .then(data => setGraphData(data));
  }, [userId]);
  
  return (
    <NVL 
      nodes={graphData.nodes} 
      rels={graphData.rels}
      style={{ width: '100%', height: '600px' }}
    />
  );
};

Memory Types

Tabby classifies memories into 5 types using LLM-based classification:

LONG_TERM

Permanent facts and preferencesExamples:

“User prefers dark mode”
“My name is John”
“I like pizza”
“I’m a software engineer”

SHORT_TERM

Temporary states and current activitiesExamples:

“Currently working on auth feature”
“Right now debugging a bug”
“Need to finish this today”
“In a meeting”

EPISODIC

Past events with time contextExamples:

“Yesterday I had a meeting”
“Last week I deployed v2”
“Met John at conference”
“Fixed the bug on Monday”

SEMANTIC

General knowledge and factsExamples:

“Python uses indentation”
“Capital of France is Paris”
“React is a JS library”
“HTTP 404 means Not Found”

PROCEDURAL

How-to knowledge and processesExamples:

“To deploy, run npm build”
“First boil water, then add pasta”
“Steps to create PR: commit, push, open PR”

Classification System

LLM-Based Classifier:

class MemoryClassifier:
    CLASSIFICATION_PROMPT = """Classify the following memory into ONE type:
    
    - SHORT_TERM: Temporary states, current activities ("currently working on...")
    - LONG_TERM: Permanent preferences, identity ("I prefer...", "My name is...")
    - EPISODIC: Past events with time ("yesterday", "last week")
    - SEMANTIC: General knowledge ("Python uses...", "Capital of...")
    - PROCEDURAL: How-to instructions ("To do X, first...")
    
    Memory: {content}
    """
    
    def classify(self, content: str) -> str:
        response = openai.chat.completions.create(
            model="gpt-4.1-nano-2025-04-14",
            messages=[{
                "role": "user",
                "content": self.CLASSIFICATION_PROMPT.format(content=content)
            }],
            max_tokens=20,
            temperature=0
        )
        return response.choices[0].message.content.strip().upper()

Automatic Classification:

@app.post("/memory/add")
async def add_memory(request: AddMemoryRequest):
    metadata = request.metadata or {}
    
    # Auto-classify if enabled
    if request.auto_classify and "memory_type" not in metadata:
        content = " ".join([m.content for m in request.messages])
        memory_type = classifier.classify(content)
        metadata["memory_type"] = memory_type
    
    result = memory.add(
        messages=[{"role": m.role, "content": m.content} for m in request.messages],
        user_id=request.user_id,
        metadata=metadata
    )
    
    return {"success": True, "result": result, "classified_type": metadata.get("memory_type")}

API Endpoints

The Memory API exposes RESTful endpoints:

Add Memory

POST /memory/add

{
  "messages": [
    { "role": "user", "content": "I prefer using TypeScript" },
    { "role": "assistant", "content": "I'll remember that you prefer TypeScript" }
  ],
  "user_id": "user_123",
  "metadata": {
    "source": "chat",
    "memory_type": "LONG_TERM"  // Optional, will auto-classify if not provided
  },
  "auto_classify": true
}

Response:
{
  "success": true,
  "result": {
    "memories": [
      {
        "id": "mem_abc123",
        "memory": "User prefers using TypeScript for development",
        "user_id": "user_123",
        "metadata": { "memory_type": "LONG_TERM" }
      }
    ]
  },
  "classified_type": "LONG_TERM"
}

Search Memories

POST /memory/search

{
  "query": "What programming languages do I use?",
  "user_id": "user_123",
  "limit": 10,
  "memory_type": "LONG_TERM"  // Optional filter
}

Response:
{
  "success": true,
  "results": [
    {
      "id": "mem_abc123",
      "memory": "User prefers using TypeScript for development",
      "score": 0.92,
      "metadata": { "memory_type": "LONG_TERM" }
    },
    {
      "id": "mem_def456",
      "memory": "User is currently learning Python",
      "score": 0.85,
      "metadata": { "memory_type": "SHORT_TERM" }
    }
  ]
}

Get All Memories

POST /memory/get_all

{
  "user_id": "user_123",
  "memory_type": "LONG_TERM"  // Optional filter
}

Response:
{
  "success": true,
  "memories": [
    {
      "id": "mem_abc123",
      "memory": "User prefers TypeScript",
      "created_at": "2026-01-15T10:30:00Z",
      "updated_at": "2026-01-15T10:30:00Z",
      "metadata": { "memory_type": "LONG_TERM" }
    }
  ]
}

Add Image Memory

POST /memory/add_image

{
  "image_url": "data:image/png;base64,iVBORw0KGgoAAAANS...",
  "context": "Screenshot from my coding interview",
  "user_id": "user_123",
  "metadata": {
    "source": "screen_capture"
  },
  "auto_classify": true
}

Response:
{
  "success": true,
  "result": {
    "memories": [
      {
        "id": "mem_img789",
        "memory": "User was solving a binary tree problem during a coding interview",
        "metadata": { 
          "memory_type": "EPISODIC",
          "source": "screen_capture"
        }
      }
    ]
  },
  "classified_type": "EPISODIC"
}

Update Memory

PUT /memory/update

{
  "memory_id": "mem_abc123",
  "data": "User prefers TypeScript and Python"
}

Delete Memory

DELETE /memory/{memory_id}

Response:
{
  "success": true,
  "result": "Memory deleted successfully"
}

Memory History

GET /memory/history/{memory_id}

Response:
{
  "success": true,
  "history": [
    {
      "id": "hist_1",
      "memory_id": "mem_abc123",
      "old_value": "User prefers TypeScript",
      "new_value": "User prefers TypeScript and Python",
      "updated_at": "2026-01-20T14:00:00Z"
    }
  ]
}

Integration with Tabby

Automatic Context Capture

Tabby automatically captures context and stores it as memories: 1. Periodic Screenshots (Interview Mode):

// electron/src/services/context-capture.ts
setInterval(async () => {
  const screenshot = await captureScreen();
  const imageUrl = await uploadToSupabase(screenshot);
  
  // Send to memory API
  await fetch(`${MEMORY_API_URL}/memory/add_image`, {
    method: 'POST',
    body: JSON.stringify({
      image_url: imageUrl,
      context: 'Auto-captured coding session',
      user_id: currentUserId,
      metadata: { source: 'auto_capture' }
    })
  });
}, 60000); // Every minute during active coding

2. Conversation Memory:

// After AI chat interaction
const addMemory = async (messages: Message[], userId: string) => {
  const response = await fetch(`${MEMORY_API_URL}/memory/add`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      messages: messages.map(m => ({ role: m.role, content: m.content })),
      user_id: userId,
      metadata: { source: 'chat' },
      auto_classify: true
    })
  });
  
  return response.json();
};

3. User Preference Tracking:

// When user changes settings
const trackPreference = async (preference: string, value: any) => {
  await fetch(`${MEMORY_API_URL}/memory/add`, {
    method: 'POST',
    body: JSON.stringify({
      messages: [
        { role: 'user', content: `I prefer ${preference} to be ${value}` }
      ],
      user_id: currentUserId,
      metadata: { 
        source: 'settings',
        memory_type: 'LONG_TERM'
      }
    })
  });
};

Memory Retrieval in AI Context

Enhancing AI Responses with Memories:

// nextjs-backend/src/app/api/chat/route.ts
export async function POST(req: Request) {
  const { messages, userId } = await req.json();
  
  // 1. Search for relevant memories
  const lastUserMessage = messages[messages.length - 1].content;
  const memoryResponse = await fetch(`${MEMORY_API_URL}/memory/search`, {
    method: 'POST',
    body: JSON.stringify({
      query: lastUserMessage,
      user_id: userId,
      limit: 5
    })
  });
  
  const { results: memories } = await memoryResponse.json();
  
  // 2. Build context with memories
  const systemPrompt = `You are Tabby, an AI assistant.
  
Relevant memories about the user:
${memories.map(m => `- ${m.memory}`).join('\n')}

Use these memories to personalize your response.`;
  
  // 3. Stream response with context
  const result = streamText({
    model: openai('gpt-4-turbo'),
    messages: [
      { role: 'system', content: systemPrompt },
      ...messages
    ]
  });
  
  return result.toDataStreamResponse();
}

Brain Panel UI

The Brain Panel (Ctrl+Shift+B) provides a visual interface for memory management:

Features:

Memory List
Knowledge Graph
Image Uploads
Statistics

View all memories by type (LONG_TERM, SHORT_TERM, etc.)
Search and filter memories
Edit or delete individual memories
See memory scores and relevance

Performance Optimization

Memory Search Caching

import { LRUCache } from 'lru-cache';

const memoryCache = new LRUCache<string, any>({
  max: 500,  // Store up to 500 searches
  ttl: 1000 * 60 * 5,  // 5 minute TTL
});

const searchMemories = async (query: string, userId: string) => {
  const cacheKey = `${userId}:${query}`;
  
  if (memoryCache.has(cacheKey)) {
    return memoryCache.get(cacheKey);
  }
  
  const results = await fetch(`${MEMORY_API_URL}/memory/search`, {
    method: 'POST',
    body: JSON.stringify({ query, user_id: userId })
  }).then(r => r.json());
  
  memoryCache.set(cacheKey, results);
  return results;
};

Batching Memory Additions

// Queue memories and batch insert every 5 seconds
const memoryQueue: Message[][] = [];

setInterval(async () => {
  if (memoryQueue.length === 0) return;
  
  const batch = memoryQueue.splice(0, memoryQueue.length);
  
  await Promise.all(
    batch.map(messages => 
      fetch(`${MEMORY_API_URL}/memory/add`, {
        method: 'POST',
        body: JSON.stringify({ messages, user_id: currentUserId })
      })
    )
  );
}, 5000);

Privacy & Data Control

All memories are stored locally in your Supabase instance. No data is sent to external servers (except AI provider APIs for processing).

User Data Isolation:

Every memory tagged with user_id
Queries filtered by user_id (no cross-user leakage)
Users can delete all their memories via UI

Data Deletion:

// Delete all user memories
DELETE /memory/user/{user_id}

// Or via Supabase directly
supabase
  .from('memories')
  .delete()
  .eq('user_id', userId);

Advanced Features

Memory Deduplication

Mem0 automatically deduplicates similar memories:

# If you add:
"User prefers dark mode"

# Then later add:
"I like dark themes"

# Mem0 will UPDATE the existing memory instead of creating a duplicate:
"User prefers dark mode and dark themes"

This is done via content hashing and semantic similarity checks.

Memory Versioning

Every memory update is tracked:

const history = await fetch(`${MEMORY_API_URL}/memory/history/mem_abc123`)
  .then(r => r.json());

// Shows:
// v1: "User prefers TypeScript"
// v2: "User prefers TypeScript and Python"
// v3: "User is an expert in TypeScript and Python"

Cross-Session Context

Memories persist across:

App restarts
Different windows (Action Menu, Brain Panel, etc.)
Different features (Chat, Interview Copilot, Suggestions)

This creates a unified user experience where the AI truly knows you.

Troubleshooting

Memory API not starting

Check:

Python 3.12+ installed
uv sync ran successfully
.env file has SUPABASE_CONNECTION_STRING
Supabase is running (npx supabase status)

Fix:

cd backend
uv sync
uv run main.py

Memories not being retrieved

Check:

User ID is consistent across requests
Vector embeddings are being generated (check Supabase)
HNSW index is created on embedding column

Debug:

-- Check if memories exist
SELECT * FROM memories WHERE user_id = 'your_user_id';

-- Check if embeddings are present
SELECT id, memory, embedding IS NOT NULL as has_embedding
FROM memories;

Neo4j not showing relationships

Check:

Neo4j credentials in .env are correct
Neo4j instance is running and accessible
Graph store is enabled in Mem0 config

Test connection:

from neo4j import GraphDatabase

driver = GraphDatabase.driver(
    uri=neo4j_url,
    auth=(neo4j_username, neo4j_password)
)

with driver.session() as session:
    result = session.run("MATCH (n) RETURN count(n)")
    print(result.single()[0])  # Should print node count

Get Started

Core Concepts

Keyboard Shortcuts

​Overview

Mem0

Supabase

Neo4j

​What is Mem0?

​Memory Lifecycle

​Architecture Components

​1. Memory Backend (FastAPI)

​2. Supabase Vector Store

​3. Neo4j Knowledge Graph (Optional)

​Memory Types

LONG_TERM

SHORT_TERM

EPISODIC

SEMANTIC

PROCEDURAL

​Classification System

​API Endpoints

​Add Memory

​Search Memories

​Get All Memories

​Add Image Memory

​Update Memory

​Delete Memory

​Memory History

​Integration with Tabby

​Automatic Context Capture

​Memory Retrieval in AI Context

​Brain Panel UI

​Performance Optimization

​Memory Search Caching

​Batching Memory Additions

​Privacy & Data Control

​Advanced Features

​Memory Deduplication

​Memory Versioning

​Cross-Session Context

​Troubleshooting

​Next Steps

System Architecture

Technology Stack

Build docs developers (and LLMs) love

Overview

What is Mem0?

Memory Lifecycle

Architecture Components

1. Memory Backend (FastAPI)

2. Supabase Vector Store

3. Neo4j Knowledge Graph (Optional)

Memory Types

Classification System

API Endpoints

Add Memory

Search Memories

Get All Memories

Add Image Memory

Update Memory

Delete Memory

Memory History

Integration with Tabby

Automatic Context Capture

Memory Retrieval in AI Context

Brain Panel UI

Performance Optimization

Memory Search Caching

Batching Memory Additions

Privacy & Data Control

Advanced Features

Memory Deduplication

Memory Versioning

Cross-Session Context

Troubleshooting

Next Steps