How Memory Works

Memori gives your AI application long-term memory. Instead of forgetting everything after each conversation, your AI can remember facts, preferences, and context across sessions and across different applications.

Attribution

Every memory in Memori is tagged with three dimensions: who (entity), what (process), and which conversation (session).

Entity (entity_id) — The person, place, or thing generating memories. Typically a user ID (e.g., "user_alice", "company_acme"). Maximum length: 100 characters.
Process (process_id) — The agent, program, or workflow creating memories (e.g., "support_bot", "code_review_agent"). Maximum length: 100 characters.
Session (session_id) — Groups related LLM interactions into a conversation thread. Auto-generated as a UUID by default.

The combination of entity_id + process_id + session_id creates a unique memory scope — different users have isolated memories, the same user can have different context in different applications, and each conversation is tracked separately.

from memori import Memori
from openai import OpenAI

client = OpenAI()
mem = Memori().llm.register(client)

# Set attribution before any LLM calls
mem.attribution(
    entity_id="user_alice",
    process_id="support_bot"
)
# session_id is auto-generated

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "I prefer dark mode."}
    ]
)

Memory Types

When you have a conversation through a Memori-wrapped LLM client, Advanced Augmentation extracts structured memories in the background:

Type	What it captures	Example
Facts	Objective information with embeddings	”User uses PostgreSQL for production databases”
Preferences	Choices, opinions, and tastes	”Prefers concise answers”
Skills & Knowledge	Abilities and expertise levels	”Experienced with React (5 years)“
Attributes	Process-level information about the agent	”Handles billing and subscription queries”

How Recall Works

Recall brings stored memories back into your AI conversations. There are two modes.

Automatic Recall (Default)

On every LLM call, Memori automatically:

Intercepts the outbound request
Uses semantic search to find relevant facts for the current entity
Injects the most relevant memories into the system prompt
Forwards the enriched request to the LLM

No extra code required — it happens transparently.

Manual Recall

Use mem.recall() to retrieve memories explicitly — useful for building custom prompts, displaying memories in a UI, or debugging.

from memori import Memori

mem = Memori()
mem.attribution(entity_id="user_alice", process_id="support_bot")

facts = mem.recall("coding preferences", limit=5)

for fact in facts:
    print(f"Fact: {fact.content}")
    print(f"Score: {fact.similarity:.4f}")

Each returned fact includes id, content, similarity (0–1 relevance score), rank_score, and date_created.

Recall Configuration

Memori uses semantic search (vector similarity) to find relevant facts. You can tune recall behavior with:

Option	Default	Description
`mem.config.recall_relevance_threshold`	`0.1`	Minimum similarity score for a fact to be included
`mem.config.recall_embeddings_limit`	`1000`	Maximum number of embeddings to compare against
`mem.config.recall_facts_limit`	`5`	Default number of facts to return

# Example: tune recall for broader or narrower results
mem.config.recall_relevance_threshold = 0.05  # Lower = more results
mem.config.recall_embeddings_limit = 500      # Reduce for lower memory usage
mem.config.recall_facts_limit = 10            # Return more facts by default

Memory Lifecycle

Conversation — Your user talks to your AI through the wrapped LLM client
Capture — Memori intercepts and stores the raw conversation
Augmentation — Advanced Augmentation processes the conversation asynchronously, extracting structured memories
Extraction — Facts, preferences, skills, and attributes are identified
Storage — Extracted memories are stored with vector embeddings
Recall — On the next LLM call, relevant memories are retrieved and injected into context

Session Management

Sessions group related conversations together. Each session has a timeout (default: 30 minutes) that determines when a new conversation starts.

from memori import Memori

mem = Memori()
mem.attribution(entity_id="user_alice", process_id="support_bot")

# Get the current session ID
current_session = mem.config.session_id
print(f"Session ID: {current_session}")

# Start a new conversation group
mem.new_session()

# Or restore a previous session
mem.set_session(current_session)

# Configure session timeout
mem.config.session_timeout_minutes = 60  # 1 hour

Sessions are automatically managed by Memori. You only need to explicitly manage sessions if you want to group or separate conversations in a specific way.

Entity and Process Resolution

Memori automatically creates entities and processes in storage when you first use them. The resolution happens during the first LLM call:

# First call creates the entity and process in storage
mem.attribution(entity_id="user_alice", process_id="support_bot")
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}]
)

# Subsequent calls reuse the same entity and process IDs
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What did I say?"}]
)

Always set attribution before making LLM calls. Without attribution, Memori cannot create or recall memories. The conversation will still work, but no memories will be stored.

Embedding Model

By default, Memori uses the all-MiniLM-L6-v2 model for generating embeddings. This model produces 384-dimensional vectors optimized for semantic similarity search. You can customize the embedding model via environment variable:

export MEMORI_EMBEDDINGS_MODEL="all-mpnet-base-v2"

Supported models include any sentence-transformers model compatible with your deployment.

Get Started

Core Concepts

Memori Cloud

Memori BYODB

How Memory Works

How Memory Works

Attribution

Memory Types

How Recall Works

Automatic Recall (Default)

Manual Recall

Recall Configuration

Memory Lifecycle

Session Management

Entity and Process Resolution

Embedding Model

Build docs developers (and LLMs) love

Get Started

Core Concepts

Memori Cloud

Memori BYODB

​How Memory Works

​Attribution

​Memory Types

​How Recall Works

​Automatic Recall (Default)

​Manual Recall

​Recall Configuration

​Memory Lifecycle

​Session Management

​Entity and Process Resolution

​Embedding Model

Build docs developers (and LLMs) love

How Memory Works

Attribution

Memory Types

How Recall Works

Automatic Recall (Default)

Manual Recall

Recall Configuration

Memory Lifecycle

Session Management

Entity and Process Resolution

Embedding Model