Advanced Augmentation
Advanced Augmentation is the AI engine inside Memori that turns raw conversations into structured, searchable memories. It runs asynchronously in the background to minimize impact on your response path.What It Does
When your application has a conversation through a Memori-wrapped LLM client, the augmentation engine:- Reads the full conversation (user messages and AI responses)
- Identifies facts, preferences, skills, and attributes
- Extracts semantic triples (subject-predicate-object relationships)
- Generates vector embeddings for semantic search
- Stores everything in your memory backend
How It Works
The augmentation flow is fully asynchronous and designed to avoid blocking your main request path.Augmentation Pipeline
- Conversation capture — Your app makes an LLM call through the wrapped client
- Immediate response — Memori returns the LLM response without delay
- Background queueing — The conversation is queued for asynchronous processing
- Memory extraction — The augmentation engine analyzes the conversation
- Batched writes — Extracted memories are written to storage in batches
Implementation Details
Advanced Augmentation uses a sophisticated async architecture for high performance:Worker Pool
- Max workers: 50 concurrent augmentation tasks (configurable via
manager.max_workers) - Semaphore-based concurrency control to prevent resource exhaustion
- Event loop management with dedicated thread for async operations
Database Writer
- Batch size: 100 write operations (configurable via
manager.db_writer_batch_size) - Batch timeout: 0.1 seconds (configurable via
manager.db_writer_batch_timeout) - Queue size: 1000 pending writes (configurable via
manager.db_writer_queue_size) - Automatic flushing when batch is full or timeout expires
memori/memory/augmentation/_manager.py:27-31
Message Selection Strategy
To optimize API usage and focus on relevant context, the augmentation engine intelligently selects messages:- With conversation summary: Sends only the most recent user-assistant exchange
- Without summary: Sends all messages in the conversation
memori/memory/augmentation/augmentations/memori/_augmentation.py:53-84
Extraction Types
| Type | What it captures | Scope |
|---|---|---|
| Facts | Objective information with vector embeddings | Per entity — shared across processes |
| Preferences | User choices, opinions, and tastes | Per entity |
| Skills & Knowledge | Abilities and expertise levels | Per entity |
| Attributes | Process-level information about what your agent handles | Per process |
Entity-Level Memories
Facts are stored in thememori_entity_fact table (or cloud equivalent) with:
- Full text content
- Vector embedding (384 dimensions for all-MiniLM-L6-v2)
- Creation timestamp
- Entity association
- Subject (e.g., “user”)
- Predicate (e.g., “uses”)
- Object (e.g., “PostgreSQL”)
- Mention count (increments on duplicate)
- Last mentioned timestamp
Process-Level Memories
Attributes are stored in thememori_process_attribute table with:
- Attribute name
- Attribute value
- Process association
Semantic Triples
Advanced Augmentation uses named-entity recognition (NER) to extract semantic triples (subject, predicate, object). These form the building blocks of the Knowledge Graph. Example — from “My favorite database is PostgreSQL and I use it with FastAPI”:| Subject | Predicate | Object |
|---|---|---|
| user | favorite_database | PostgreSQL |
| user | uses | FastAPI |
| user | uses_with | PostgreSQL + FastAPI |
Triple Storage
Triples are normalized and stored across multiple tables:- Subjects — Unique subjects (e.g., “user”, “Alice”)
- Predicates — Unique predicates (e.g., “uses”, “prefers”)
- Objects — Unique objects (e.g., “PostgreSQL”, “dark mode”)
- Knowledge Graph — Links subjects, predicates, and objects with metadata
_augmentation.py:246-251
Embeddings
Vector embeddings power semantic recall. Memori generates embeddings for:- Extracted facts
- Facts derived from semantic triples
all-MiniLM-L6-v2 (384 dimensions)
Configurable via: MEMORI_EMBEDDINGS_MODEL environment variable
Source: memori/embeddings/_api.py and embedding generation in _augmentation.py:196-200, 227-232
Context Recall
When a query is sent to an LLM through a wrapped client, Memori automatically:- Intercepts the outbound LLM call
- Embeds the user’s query
- Uses semantic search to find entity facts matching the query
- Ranks facts by cosine similarity via FAISS
- Injects the most relevant facts into the system prompt
- Forwards the enriched request to the LLM provider
- FAISS
IndexFlatIP(inner product after L2 normalization = cosine similarity) - Exact search (not approximate)
- Filters by
recall_relevance_threshold(default: 0.1)
- Recall implementation:
memori/memory/recall.py:180-226 - FAISS integration:
memori/search/_faiss.py:81-122
Waiting for Augmentation
In short-lived scripts, callmem.augmentation.wait() to ensure processing completes before the program exits:
- Waits for all pending futures to complete
- Waits for the database writer queue to drain
- Waits an additional 2x batch timeout for final processing
- Returns
Trueif successful,Falseif timeout exceeded
memori/memory/augmentation/_manager.py:168-210
Long-running applications (web servers, chatbots) don’t need to call
.wait() — augmentation happens continuously in the background.Error Handling
Advanced Augmentation gracefully handles errors:Quota Exceeded
When quota is exceeded, augmentation is automatically disabled:Augmentation Failures
If an individual augmentation task fails:- Error is logged
- Other augmentation tasks continue
- Your LLM calls are not affected
_manager.py:97-110
Configuration
Customize augmentation behavior:Metadata Collection
Advanced Augmentation collects metadata about your deployment for analytics and debugging:- SDK version — Python SDK version
- LLM provider — OpenAI, Anthropic, Google, etc.
- LLM model — gpt-4o-mini, claude-3-sonnet, etc.
- Framework — LangChain, PydanticAI, etc.
- Platform — FastAPI, Django, etc.
- Storage dialect — PostgreSQL, SQLite, MongoDB, etc.
_augmentation.py:86-123
View Augmentation Payload Structure
View Augmentation Payload Structure