Skip to main content

Overview

Azen’s semantic search uses vector embeddings to find memories based on meaning, not just keywords. Search with natural language queries and get the most relevant results.
1

Write your query

Use natural language to describe what you’re looking for.
2

Send POST request

Send your query to /api/v1/memory/search with optional topK parameter.
3

Get ranked results

Receive memories ranked by semantic similarity with scores.
curl -X POST https://api.azen.sh/api/v1/memory/search \
  -H "azen-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "outdoor activities",
    "topK": 5
  }'
Response (200 OK):
{
  "status": "success",
  "memories": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "content": "I love hiking in the mountains",
      "metadata": null,
      "createdAt": "2024-01-15T10:30:00.000Z",
      "embedded": true
    },
    {
      "id": "550e8400-e29b-41d4-a716-446655440001",
      "content": "Rock climbing is my favorite weekend activity",
      "metadata": null,
      "createdAt": "2024-01-14T15:20:00.000Z",
      "embedded": true
    }
  ],
  "rawMatches": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000::0",
      "score": 0.89,
      "values": []
    },
    {
      "id": "550e8400-e29b-41d4-a716-446655440001::0",
      "score": 0.82,
      "values": []
    }
  ]
}

Request Parameters

query (required)

The search query as natural language text.
  • Type: string
  • Min length: 1 character
  • Format: Plain text, natural language
Examples:
  • "What are my hobbies?"
  • "preferences about food"
  • "meetings scheduled next week"

topK (optional)

Maximum number of results to return.
  • Type: number
  • Range: 1-50
  • Default: 5
Setting topK higher returns more results but may include less relevant matches.

How Semantic Search Works

1

Query Embedding

Your query text is embedded using the same model used for memories (OpenAI text-embedding-3-small).
2

Vector Search

The query vector is compared against stored memory vectors in Pinecone using cosine similarity.
3

Retrieve Memory IDs

The top K most similar vector IDs are extracted from the matches.
4

Fetch and Decrypt

Encrypted memories are fetched from Postgres and decrypted in-memory.
5

Return Results

Decrypted memories are returned in order of similarity with scores.

Implementation Reference

From apps/api/src/routes/search.ts:
// Embed the query
const [qEmb] = await embedBatch([query]);

// Search vectors in Pinecone
const namespace = `org-${organizationId}`;
const matches = await queryVectors(qEmb, topK, namespace);

// Extract memory IDs
const memIds = Array.from(
  new Set(
    matches
      .map(m => m.id?.split("::")[0])
      .filter((id): id is string => !!id)
  )
);

// Fetch and decrypt memories
const mems = await db
  .select({
    id: memory.id,
    encryptedContent: memory.encryptedContent,
    iv: memory.iv,
    tag: memory.tag,
    // ...
  })
  .from(memory)
  .where(and(
    inArray(memory.id, memIds),
    eq(memory.organizationId, organizationId)
  ));

// Decrypt and order by similarity
const orderedMems = memIds
  .map((id) => {
    const m = mems.find((x) => x.id === id);
    if (!m) return null;
    return {
      id: m.id,
      content: decryptText(m.encryptedContent, m.iv, m.tag),
      // ...
    };
  })
  .filter(Boolean);

Understanding Search Results

Memory Objects

Each memory in the memories array contains:
  • id: Unique memory identifier (UUID)
  • content: Decrypted memory text
  • metadata: Optional metadata (currently null)
  • createdAt: ISO 8601 timestamp
  • embedded: Whether embedding is complete (always true in search results)

Raw Matches

The rawMatches array provides vector search details:
  • id: Memory ID with chunk index (e.g., memoryId::0)
  • score: Cosine similarity score (0-1, higher is better)
  • values: Empty array (vector values not returned)
Similarity scores above 0.7 typically indicate strong relevance. Scores below 0.5 may be coincidental.

Search Strategies

Specific Queries

Ask specific questions for precise results:
{
  "query": "What programming languages does the user prefer?",
  "topK": 3
}

Broad Discovery

Use general terms to explore related memories:
{
  "query": "hobbies and interests",
  "topK": 10
}
Include context in your query:
{
  "query": "user feedback about the mobile app interface",
  "topK": 5
}

Filtering Search Results

Currently, Azen searches across all memories in your organization. To filter results:
  1. Client-side filtering: Filter the returned memories by date, content patterns, etc.
  2. Multiple queries: Run separate queries for different topics
  3. Metadata tags (coming soon): Tag memories for category-based filtering

Handling No Results

If no relevant memories are found:
{
  "status": "success",
  "memories": [],
  "rawMatches": []
}
Reasons:
  • No memories have been embedded yet (check embedded field)
  • Query doesn’t match any stored content semantically
  • All memories are below the similarity threshold
Wait a few seconds after creating memories to ensure embeddings are processed before searching.

Error Handling

Invalid Request (400)

Missing or invalid query:
{
  "status": "invalid_request",
  "message": "Invalid request body",
  "code": 400
}
Solution: Ensure query field is a non-empty string and topK (if provided) is between 1-50.

Embedding Failure (500)

{
  "status": "internal_server_error",
  "message": "Failed to embed query",
  "code": 500
}
Solution: Retry the request. If persistent, the embedding service may be unavailable.

Performance Considerations

Search Latency

Typical search latency breakdown:
  • Query embedding: ~50-200ms
  • Vector search (Pinecone): ~50-150ms
  • Database fetch + decryption: ~20-100ms
  • Total: ~120-450ms
Latency increases with topK due to more database fetches and decryption operations.

Search Quality

For best results:
  • Store memories with clear, descriptive content
  • Keep individual memories focused on single topics
  • Use consistent terminology across related memories
  • Avoid very short memories (< 10 characters)

Usage Tracking

Search requests are automatically tracked:
  • Each successful POST /api/v1/memory/search increments searchCount
  • Failed requests increment errorCount
  • Tracking is per organization, per API key, per day
See Usage Tracking for monitoring your usage.

Advanced Use Cases

Conversational Context

Build conversation context by searching recent messages:
const context = await searchMemories(
  `recent conversation with ${userName}`,
  5
);

// Use context in AI prompt
const prompt = `
Based on our previous conversations:
${context.memories.map(m => m.content).join('\n')}

User: ${newMessage}
`;

Personalization

Find user preferences for personalized experiences:
const preferences = await searchMemories(
  'user preferences and settings',
  10
);

// Apply preferences
const theme = preferences.memories.find(m => 
  m.content.includes('theme')
);

Knowledge Retrieval

Search a knowledge base for relevant information:
const knowledge = await searchMemories(
  'How do I configure the API rate limits?',
  3
);

// Return as FAQ answer
const answer = knowledge.memories[0]?.content;

Next Steps

Semantic Search Concepts

Learn how vector embeddings work

Create Memories

Store memories to search

List All Memories

Browse memories with pagination

Search API Reference

Complete search endpoint documentation

Build docs developers (and LLMs) love