Searching Memories

Overview

Azen’s semantic search uses vector embeddings to find memories based on meaning, not just keywords. Search with natural language queries and get the most relevant results.

Basic Search

Write your query

Use natural language to describe what you’re looking for.

Send POST request

Send your query to /api/v1/memory/search with optional topK parameter.

Get ranked results

Receive memories ranked by semantic similarity with scores.

Example Search

curl -X POST https://api.azen.sh/api/v1/memory/search \
  -H "azen-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "outdoor activities",
    "topK": 5
  }'

Response (200 OK):

{
  "status": "success",
  "memories": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "content": "I love hiking in the mountains",
      "metadata": null,
      "createdAt": "2024-01-15T10:30:00.000Z",
      "embedded": true
    },
    {
      "id": "550e8400-e29b-41d4-a716-446655440001",
      "content": "Rock climbing is my favorite weekend activity",
      "metadata": null,
      "createdAt": "2024-01-14T15:20:00.000Z",
      "embedded": true
    }
  ],
  "rawMatches": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000::0",
      "score": 0.89,
      "values": []
    },
    {
      "id": "550e8400-e29b-41d4-a716-446655440001::0",
      "score": 0.82,
      "values": []
    }
  ]
}

Request Parameters

`query` (required)

The search query as natural language text.

Type: string
Min length: 1 character
Format: Plain text, natural language

Examples:

"What are my hobbies?"
"preferences about food"
"meetings scheduled next week"

`topK` (optional)

Maximum number of results to return.

Type: number
Range: 1-50
Default: 5

Setting topK higher returns more results but may include less relevant matches.

How Semantic Search Works

Query Embedding

Your query text is embedded using the same model used for memories (OpenAI text-embedding-3-small).

Vector Search

The query vector is compared against stored memory vectors in Pinecone using cosine similarity.

Retrieve Memory IDs

The top K most similar vector IDs are extracted from the matches.

Fetch and Decrypt

Encrypted memories are fetched from Postgres and decrypted in-memory.

Return Results

Decrypted memories are returned in order of similarity with scores.

Implementation Reference

From apps/api/src/routes/search.ts:

// Embed the query
const [qEmb] = await embedBatch([query]);

// Search vectors in Pinecone
const namespace = `org-${organizationId}`;
const matches = await queryVectors(qEmb, topK, namespace);

// Extract memory IDs
const memIds = Array.from(
  new Set(
    matches
      .map(m => m.id?.split("::")[0])
      .filter((id): id is string => !!id)
  )
);

// Fetch and decrypt memories
const mems = await db
  .select({
    id: memory.id,
    encryptedContent: memory.encryptedContent,
    iv: memory.iv,
    tag: memory.tag,
    // ...
  })
  .from(memory)
  .where(and(
    inArray(memory.id, memIds),
    eq(memory.organizationId, organizationId)
  ));

// Decrypt and order by similarity
const orderedMems = memIds
  .map((id) => {
    const m = mems.find((x) => x.id === id);
    if (!m) return null;
    return {
      id: m.id,
      content: decryptText(m.encryptedContent, m.iv, m.tag),
      // ...
    };
  })
  .filter(Boolean);

Understanding Search Results

Memory Objects

Each memory in the memories array contains:

id: Unique memory identifier (UUID)
content: Decrypted memory text
metadata: Optional metadata (currently null)
createdAt: ISO 8601 timestamp
embedded: Whether embedding is complete (always true in search results)

Raw Matches

The rawMatches array provides vector search details:

id: Memory ID with chunk index (e.g., memoryId::0)
score: Cosine similarity score (0-1, higher is better)
values: Empty array (vector values not returned)

Similarity scores above 0.7 typically indicate strong relevance. Scores below 0.5 may be coincidental.

Search Strategies

Specific Queries

Ask specific questions for precise results:

{
  "query": "What programming languages does the user prefer?",
  "topK": 3
}

Broad Discovery

Use general terms to explore related memories:

{
  "query": "hobbies and interests",
  "topK": 10
}

Contextual Search

Include context in your query:

{
  "query": "user feedback about the mobile app interface",
  "topK": 5
}

Filtering Search Results

Currently, Azen searches across all memories in your organization. To filter results:

Client-side filtering: Filter the returned memories by date, content patterns, etc.
Multiple queries: Run separate queries for different topics
Metadata tags (coming soon): Tag memories for category-based filtering

Handling No Results

If no relevant memories are found:

{
  "status": "success",
  "memories": [],
  "rawMatches": []
}

Reasons:

No memories have been embedded yet (check embedded field)
Query doesn’t match any stored content semantically
All memories are below the similarity threshold

Wait a few seconds after creating memories to ensure embeddings are processed before searching.

Error Handling

Invalid Request (400)

Missing or invalid query:

{
  "status": "invalid_request",
  "message": "Invalid request body",
  "code": 400
}

Solution: Ensure query field is a non-empty string and topK (if provided) is between 1-50.

Embedding Failure (500)

{
  "status": "internal_server_error",
  "message": "Failed to embed query",
  "code": 500
}

Solution: Retry the request. If persistent, the embedding service may be unavailable.

Performance Considerations

Search Latency

Typical search latency breakdown:

Query embedding: ~50-200ms
Vector search (Pinecone): ~50-150ms
Database fetch + decryption: ~20-100ms
Total: ~120-450ms

Latency increases with topK due to more database fetches and decryption operations.

Search Quality

For best results:

Store memories with clear, descriptive content
Keep individual memories focused on single topics
Use consistent terminology across related memories
Avoid very short memories (< 10 characters)

Usage Tracking

Search requests are automatically tracked:

Each successful POST /api/v1/memory/search increments searchCount
Failed requests increment errorCount
Tracking is per organization, per API key, per day

See Usage Tracking for monitoring your usage.

Advanced Use Cases

Conversational Context

Build conversation context by searching recent messages:

const context = await searchMemories(
  `recent conversation with ${userName}`,
  5
);

// Use context in AI prompt
const prompt = `
Based on our previous conversations:
${context.memories.map(m => m.content).join('\n')}

User: ${newMessage}
`;

Personalization

Find user preferences for personalized experiences:

const preferences = await searchMemories(
  'user preferences and settings',
  10
);

// Apply preferences
const theme = preferences.memories.find(m => 
  m.content.includes('theme')
);

Knowledge Retrieval

Search a knowledge base for relevant information:

const knowledge = await searchMemories(
  'How do I configure the API rate limits?',
  3
);

// Return as FAQ answer
const answer = knowledge.memories[0]?.content;

Next Steps

Semantic Search Concepts

Learn how vector embeddings work

Create Memories

Store memories to search

List All Memories

Browse memories with pagination

Search API Reference

Complete search endpoint documentation

Get Started

Core Concepts

Guides

Searching Memories

Overview

Basic Search

Example Search

Request Parameters

`query` (required)

`topK` (optional)

How Semantic Search Works

Implementation Reference

Understanding Search Results

Memory Objects

Raw Matches

Search Strategies

Specific Queries

Broad Discovery

Contextual Search

Filtering Search Results

Handling No Results

Error Handling

Invalid Request (400)

Embedding Failure (500)

Performance Considerations

Search Latency

Search Quality

Usage Tracking

Advanced Use Cases

Conversational Context

Personalization

Knowledge Retrieval

Next Steps

Semantic Search Concepts

Create Memories

List All Memories

Search API Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Overview

​Basic Search

​Example Search

​Request Parameters

​query (required)

​topK (optional)

​How Semantic Search Works

​Implementation Reference

​Understanding Search Results

​Memory Objects

​Raw Matches

​Search Strategies

​Specific Queries

​Broad Discovery

​Contextual Search

​Filtering Search Results

​Handling No Results

​Error Handling

​Invalid Request (400)

​Embedding Failure (500)

​Performance Considerations

​Search Latency

​Search Quality

​Usage Tracking

​Advanced Use Cases

​Conversational Context

​Personalization

​Knowledge Retrieval

​Next Steps

Semantic Search Concepts

Create Memories

List All Memories

Search API Reference

Build docs developers (and LLMs) love

Overview

Basic Search

Example Search

Request Parameters

`query` (required)

`topK` (optional)

How Semantic Search Works

Implementation Reference

Understanding Search Results

Memory Objects

Raw Matches

Search Strategies

Specific Queries

Broad Discovery

Contextual Search

Filtering Search Results

Handling No Results

Error Handling

Invalid Request (400)

Embedding Failure (500)

Performance Considerations

Search Latency

Search Quality

Usage Tracking

Advanced Use Cases

Conversational Context

Personalization

Knowledge Retrieval

Next Steps