Semantic recall enables agents to retrieve contextually relevant messages from conversation history using vector embeddings and similarity search. This provides long-term memory beyond recent message limits.
How It Works
The SemanticRecall processor operates as both an input and output processor:
On Input : Performs semantic search on historical messages and adds relevant context
On Output : Creates embeddings for new messages to enable future semantic search
Basic Configuration
Enable semantic recall with vector storage and an embedder:
import { Memory } from '@mastra/core' ;
import { PgVector } from '@mastra/vector-pg' ;
import { LibSQLStore } from '@mastra/store-libsql' ;
const memory = new Memory ({
storage: new LibSQLStore ({
id: 'agent-memory' ,
url: 'file:./memory.db'
}),
vector: new PgVector ({
connectionString: process . env . DATABASE_URL
}),
embedder: 'openai/text-embedding-3-small' ,
options: {
lastMessages: 10 ,
semanticRecall: {
topK: 5 ,
messageRange: 2 ,
scope: 'resource'
}
}
});
Configuration Options
Enable semantic recall with defaults (true) or configure with detailed options
Number of most similar messages to retrieve from the vector database
messageRange
number | { before: number; after: number }
default: "1"
Amount of surrounding context to include with each retrieved message
scope
'thread' | 'resource'
default: "'resource'"
Scope of semantic search:
thread: Search only within the current conversation thread
resource: Search across all threads owned by the user/resource
Minimum similarity score (0-1). Messages below this threshold are filtered out.
Configuration Examples
Simple Setup
const memory = new Memory ({
storage ,
vector ,
embedder: 'openai/text-embedding-3-small' ,
options: {
semanticRecall: true // Enable with defaults
}
});
Advanced Configuration
const memory = new Memory ({
storage ,
vector ,
embedder: 'openai/text-embedding-3-large' ,
embedderOptions: {
providerOptions: {
openai: {
dimensions: 1536 // Custom embedding dimensions
}
}
},
options: {
lastMessages: 10 ,
semanticRecall: {
topK: 8 ,
messageRange: { before: 2 , after: 3 },
scope: 'resource' ,
threshold: 0.7 ,
indexConfig: {
type: 'hnsw' ,
metric: 'dotproduct' ,
hnsw: {
m: 16 ,
efConstruction: 64
}
}
}
}
});
Thread-Scoped Recall
const memory = new Memory ({
storage ,
vector ,
embedder: 'openai/text-embedding-3-small' ,
options: {
semanticRecall: {
topK: 5 ,
scope: 'thread' // Only search current thread
}
}
});
Vector Store Setup
Semantic recall requires a vector database. Mastra supports multiple providers:
PostgreSQL (PgVector)
Pinecone
Qdrant
Chroma
import { PgVector } from '@mastra/vector-pg' ;
const vector = new PgVector ({
connectionString: process . env . DATABASE_URL
});
Embedder Configuration
Choose an embedding model compatible with your use case:
const memory = new Memory ({
storage ,
vector ,
embedder: 'openai/text-embedding-3-small' ,
embedderOptions: {
providerOptions: {
openai: {
dimensions: 1536
}
}
},
options: {
semanticRecall: true
}
});
Index Optimization
For PostgreSQL with pgvector, you can optimize semantic recall performance with index configuration:
const memory = new Memory ({
storage ,
vector ,
embedder: 'openai/text-embedding-3-small' ,
options: {
semanticRecall: {
topK: 5 ,
indexConfig: {
type: 'hnsw' , // Hierarchical Navigable Small World
metric: 'dotproduct' , // Best for OpenAI embeddings
hnsw: {
m: 16 , // Links per node
efConstruction: 64 // Construction quality
}
}
}
}
});
Index Types :
hnsw: Best performance for most cases (recommended)
ivfflat: Good balance of speed and recall
flat: Exact nearest neighbor (slow but 100% recall)
Cross-Thread Recall
When using scope: 'resource', semantic recall can retrieve messages from other threads:
const memory = new Memory ({
storage ,
vector ,
embedder: 'openai/text-embedding-3-small' ,
options: {
semanticRecall: {
topK: 5 ,
messageRange: 2 ,
scope: 'resource' // Search across all user threads
}
}
});
const agent = new Agent ({
name: 'Assistant' ,
model: 'openai/gpt-4o' ,
memory
});
// Query references information from previous conversations
const result = await agent . generate (
'What did I say about my dietary preferences?' ,
{
threadId: 'current-thread' ,
resourceId: 'user-123'
}
);
Cross-thread messages are formatted with timestamps:
The following messages were remembered from a different conversation:
<remembered_from_other_conversation>
the following messages are from 2024, Feb, 15
Message from previous conversation at 3:45 PM: User: I'm allergic to peanuts
Message from previous conversation at 3:46 PM: Assistant: I'll make sure to avoid peanuts in all recommendations
<end_remembered_from_other_conversation>
Embedding Cache
SemanticRecall uses a global embedding cache to avoid redundant API calls:
import { globalEmbeddingCache } from '@mastra/core/processors' ;
// Clear cache if needed
globalEmbeddingCache . clear ();
// Check cache size
console . log ( `Cache size: ${ globalEmbeddingCache . size } ` );
The cache uses xxhash for fast key generation and includes the index name to ensure isolation between different embedding models/dimensions.
Implementation Details
The SemanticRecall processor handles semantic search and embedding creation:
Input Processing
Output Processing
async processInput ( args ) {
const { messages , messageList , requestContext } = args ;
// Extract user query from last user message
const userQuery = this . extractUserQuery ( messages );
if ( ! userQuery ) return messageList ;
// Generate embeddings for the query
const { embeddings , dimension } = await this . embedMessageContent (
userQuery ,
indexName
);
// Ensure vector index exists
await this . ensureVectorIndex ( indexName , dimension );
// Perform vector search
const results = await this . vector . query ({
indexName ,
queryVector: embeddings [ 0 ],
topK: this . topK ,
filter: this . scope === 'resource'
? { resource_id: resourceId }
: { thread_id: threadId }
});
// Retrieve messages with context
const similarMessages = await this . storage . listMessages ({
threadId ,
resourceId ,
include: results . map ( r => ({
id: r . metadata ?. message_id ,
threadId: r . metadata ?. thread_id ,
withNextMessages: this . messageRange . after ,
withPreviousMessages: this . messageRange . before
}))
});
// Add to message list
messageList . add ( similarMessages , 'memory' );
return messageList ;
}
Best Practices
Choose the Right Scope Use resource scope for cross-conversation context, thread scope for session-specific recall.
Tune TopK Start with 3-5 similar messages. More results increase context but also token usage.
Set a Threshold Filter low-quality matches with a similarity threshold (e.g., 0.7).
Optimize Indexes Use HNSW indexes for PostgreSQL to improve query performance.
Troubleshooting
Check that embeddings were created (verify vector store has data)
Lower the threshold value if set
Ensure scope matches your use case (thread vs resource)
Verify embedder dimensions match vector store index
Reduce topK (fewer messages retrieved)
Reduce messageRange (less surrounding context)
Increase threshold (only highly relevant matches)
Balance with lastMessages to avoid redundancy
Next Steps
Working Memory Store structured user information across conversations
RAG Overview Learn about document-based RAG in Mastra
Conversation History Manage recent message persistence