Skip to main content

Overview

Memori uses embeddings to store and retrieve memories semantically. You can customize the embedding model to match your application’s needs.

Default Embeddings

By default, Memori uses efficient open-source models for embeddings.
Memori Cloud uses optimized embedding models automatically. For self-hosted deployments, you can customize the embedding configuration.

Generating Embeddings

You can manually generate embeddings for custom use cases.
from memori import Memori

mem = Memori()

# Generate embeddings for a single text
text = "Machine learning is transforming industries"
embedding = mem.embed_texts(text)

print(f"Generated embedding with {len(embedding[0])} dimensions")
print(f"First few values: {embedding[0][:5]}")

# Generate embeddings for multiple texts
texts = [
    "Artificial intelligence is advancing rapidly",
    "Neural networks power modern AI systems",
    "Deep learning requires large datasets",
]
embeddings = mem.embed_texts(texts)

print(f"Generated {len(embeddings)} embeddings")
for i, emb in enumerate(embeddings):
    print(f"Text {i+1}: {len(emb)} dimensions")

Async Embeddings

Generate embeddings asynchronously for better performance.
import asyncio
from memori import Memori

async def generate_async_embeddings():
    mem = Memori()

    texts = [
        "First document about AI",
        "Second document about ML",
        "Third document about DL",
    ]

    # Generate embeddings asynchronously (runs in thread pool)
    embeddings_future = mem.embed_texts(texts, async_=True)

    # Do other work while embeddings are being generated
    print("Generating embeddings in background...")

    # Await the result
    embeddings = await embeddings_future

    print(f"Generated {len(embeddings)} embeddings")
    return embeddings

asyncio.run(generate_async_embeddings())

Configuring Embedding Models

For self-hosted deployments, you can configure the embedding model.
from memori import Memori
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

# Setup database
engine = create_engine("sqlite:///memori.db")
Session = sessionmaker(bind=engine)

# Create Memori instance with custom config
mem = Memori(conn=Session)

# Configure embedding model
mem.config.embeddings.model = "sentence-transformers/all-MiniLM-L6-v2"

# Build storage
mem.config.storage.build()

# Use normally - embeddings will use the configured model
text = "This will be embedded with the custom model"
embedding = mem.embed_texts(text)

print(f"Embedding dimension: {len(embedding[0])}")
Use embeddings for semantic similarity search.
from memori import Memori
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

mem = Memori()

# Document corpus
documents = [
    "Python is a programming language",
    "JavaScript runs in browsers",
    "Machine learning uses neural networks",
    "Databases store structured data",
    "APIs connect different services",
]

# Generate embeddings for all documents
doc_embeddings = mem.embed_texts(documents)

# User query
query = "How do I connect to a web service?"
query_embedding = mem.embed_texts(query)[0]

# Find most similar documents
similarities = []
for i, doc_emb in enumerate(doc_embeddings):
    similarity = cosine_similarity(query_embedding, doc_emb)
    similarities.append((i, similarity, documents[i]))

# Sort by similarity
similarities.sort(key=lambda x: x[1], reverse=True)

print("Most relevant documents:")
for idx, score, doc in similarities[:3]:
    print(f"  [{score:.3f}] {doc}")

Custom Embedding Pipeline

Integrate Memori with your existing embedding infrastructure.
from memori import Memori
from openai import OpenAI
import numpy as np

class CustomEmbeddingMemori:
    def __init__(self):
        self.mem = Memori()
        self.openai_client = OpenAI()

    def generate_embeddings_openai(self, texts: list[str]) -> list[list[float]]:
        """Use OpenAI embeddings instead of default model"""
        response = self.openai_client.embeddings.create(
            model="text-embedding-3-small",
            input=texts,
        )
        return [item.embedding for item in response.data]

    def search_documents(self, query: str, documents: list[str]) -> list[tuple]:
        # Use custom embeddings
        doc_embeddings = self.generate_embeddings_openai(documents)
        query_embedding = self.generate_embeddings_openai([query])[0]

        # Calculate similarities
        similarities = []
        for i, doc_emb in enumerate(doc_embeddings):
            similarity = np.dot(query_embedding, doc_emb)
            similarities.append((i, similarity, documents[i]))

        return sorted(similarities, key=lambda x: x[1], reverse=True)

# Usage
custom_mem = CustomEmbeddingMemori()
documents = [
    "AI is transforming healthcare",
    "Blockchain enables decentralization",
    "Quantum computing is emerging",
]
results = custom_mem.search_documents("medical technology", documents)

for idx, score, doc in results:
    print(f"[{score:.3f}] {doc}")

Embedding Model Comparison

ModelDimensionsSpeedQuality
all-MiniLM-L6-v2384FastGood
all-mpnet-base-v2768MediumBetter
text-embedding-3-small1536SlowBest

Best Practices

Use Default Models

Memori’s default models are optimized for memory recall - only customize if needed.

Batch Embeddings

Generate embeddings in batches for better performance.

Cache Embeddings

Store embeddings to avoid regenerating them repeatedly.

Monitor Dimensions

Higher dimensions = better quality but slower search.

Advanced Configuration

from memori import Memori
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

engine = create_engine("postgresql://user:pass@localhost/memori")
Session = sessionmaker(bind=engine)

mem = Memori(conn=Session)

# Configure embeddings
mem.config.embeddings.model = "sentence-transformers/all-mpnet-base-v2"
mem.config.recall_facts_limit = 10  # Return top 10 facts
mem.config.recall_embeddings_limit = 50  # Search top 50 embeddings
mem.config.recall_relevance_threshold = 0.7  # Min relevance score

mem.config.storage.build()

Troubleshooting

  • Use a smaller model like all-MiniLM-L6-v2
  • Generate embeddings asynchronously with async_=True
  • Batch multiple texts together
  • Use a larger model like all-mpnet-base-v2
  • Adjust recall_relevance_threshold (lower = more results)
  • Increase recall_embeddings_limit to search more candidates
  • Use models with lower dimensions
  • Reduce recall_embeddings_limit
  • Clear old memories periodically

Next Steps

Basic Memory

Learn basic memory operations

Multi-User

Manage memories for multiple users

Build docs developers (and LLMs) love