Multi-Tenancy Patterns

Overview

Multi-tenancy is a software architecture where a single instance serves multiple customers (tenants) while keeping their data isolated. TopK provides flexible patterns for implementing multi-tenant search applications with strong data isolation and optimal performance.

Multi-Tenancy Strategies

TopK supports three primary multi-tenancy approaches, each with different trade-offs:

Collection per Tenant

Strongest isolation with separate collections for each tenant

Field-Based Partitioning

Shared collection with tenant ID filtering for better resource utilization

Hybrid Approach

Combine both strategies based on tenant size and requirements

Collection per Tenant

The collection-per-tenant pattern creates a separate collection for each tenant, providing the strongest isolation guarantees.

Implementation

JavaScript
Python

import { Client } from "topk-js";
import { text, int, f32Vector, vectorIndex, semanticIndex } from "topk-js/schema";

const client = new Client({
  apiKey: "YOUR_API_KEY",
  region: "aws-us-east-1-elastica"
});

// Create a collection for a specific tenant
async function createTenantCollection(tenantId) {
  const collectionName = `tenant_${tenantId}_documents`;
  
  await client.collections().create(collectionName, {
    title: text().required().index(semanticIndex()),
    content: text().required(),
    category: text(),
    created_at: int(),
    embedding: f32Vector({ dimension: 768 }).index(
      vectorIndex({ metric: "cosine" })
    )
  });
  
  return collectionName;
}

// Get tenant-specific collection
function getTenantCollection(tenantId) {
  return client.collection(`tenant_${tenantId}_documents`);
}

// Insert tenant data
async function addDocuments(tenantId, documents) {
  const collection = getTenantCollection(tenantId);
  const lsn = await collection.upsert(documents);
  return lsn;
}

from topk_sdk import Client
from topk_sdk.schema import text, int, f32_vector, vector_index, semantic_index

client = Client(
  api_key="YOUR_API_KEY",
  region="aws-us-east-1-elastica"
)

# Create a collection for a specific tenant
def create_tenant_collection(tenant_id):
  collection_name = f"tenant_{tenant_id}_documents"
  
  client.collections().create(collection_name, {
    "title": text().required().index(semantic_index()),
    "content": text().required(),
    "category": text(),
    "created_at": int(),
    "embedding": f32_vector(dimension=768).index(
      vector_index(metric="cosine")
    )
  })
  
  return collection_name

# Get tenant-specific collection
def get_tenant_collection(tenant_id):
  return client.collection(f"tenant_{tenant_id}_documents")

# Insert tenant data
def add_documents(tenant_id, documents):
  collection = get_tenant_collection(tenant_id)
  lsn = collection.upsert(documents)
  return lsn

Querying Tenant Data

import { field, filter, fn } from "topk-js/query";

async function searchTenantDocuments(tenantId, queryVector, filters = {}) {
  const collection = getTenantCollection(tenantId);
  
  let query = filter(field("title").isNotNull())
    .topk(fn.vectorDistance("embedding", queryVector), 20);
  
  // Apply additional filters if provided
  if (filters.category) {
    query = filter(
      field("category").eq(filters.category)
        .and(field("title").isNotNull())
    ).topk(fn.vectorDistance("embedding", queryVector), 20);
  }
  
  return await collection.query(query);
}

Advantages

Strong Isolation

Complete data separation between tenants
Tenant data physically isolated
Easy to backup or restore individual tenants
Simplified compliance and audit trails

Independent Scaling

Scale storage independently per tenant
Optimize indexes for tenant-specific workloads
Different schema versions per tenant if needed

Operational Flexibility

Delete tenant data by dropping collection
Migrate tenants independently
Apply tenant-specific optimizations

Considerations

Collection limits: Consider the number of tenants and TopK’s collection limits. This approach works best for up to thousands of tenants.

Management overhead: More collections mean more management. Use infrastructure-as-code to automate collection creation and maintenance.

Field-Based Partitioning

Field-based partitioning stores all tenant data in a shared collection with a tenant_id field, using query filters to isolate tenant data.

Implementation

import { Client } from "topk-js";
import { text, int, f32Vector, vectorIndex, semanticIndex } from "topk-js/schema";

const client = new Client({
  apiKey: "YOUR_API_KEY",
  region: "aws-us-east-1-elastica"
});

// Create a shared multi-tenant collection
await client.collections().create("shared_documents", {
  tenant_id: text().required(),  // Tenant identifier
  title: text().required().index(semanticIndex()),
  content: text().required(),
  category: text(),
  created_at: int(),
  embedding: f32Vector({ dimension: 768 }).index(
    vectorIndex({ metric: "cosine" })
  )
});

// Insert documents with tenant_id
async function addTenantDocuments(tenantId, documents) {
  // Add tenant_id to each document
  const tenantDocs = documents.map(doc => ({
    ...doc,
    tenant_id: tenantId
  }));
  
  const lsn = await client.collection("shared_documents").upsert(tenantDocs);
  return lsn;
}

Querying with Tenant Isolation

Always filter by tenant_id to ensure data isolation:

import { field, filter, fn } from "topk-js/query";

async function searchDocuments(tenantId, queryVector, limit = 20) {
  const results = await client.collection("shared_documents").query(
    filter(field("tenant_id").eq(tenantId))
      .topk(fn.vectorDistance("embedding", queryVector), limit)
  );
  
  return results;
}

Tenant Data Operations

import { field } from "topk-js/query";

// Count documents for a tenant
async function countTenantDocuments(tenantId) {
  const count = await client.collection("shared_documents").count();
  // Note: count with filter requires query
  const results = await client.collection("shared_documents").query(
    filter(field("tenant_id").eq(tenantId)).count()
  );
  return results.length;
}

// Delete all documents for a tenant
async function deleteTenantData(tenantId) {
  const lsn = await client.collection("shared_documents").delete(
    field("tenant_id").eq(tenantId)
  );
  return lsn;
}

// Get specific documents for a tenant
async function getTenantDocuments(tenantId, documentIds) {
  const allDocs = await client.collection("shared_documents").get(documentIds);
  
  // Filter to ensure tenant_id matches (security layer)
  const tenantDocs = Object.entries(allDocs)
    .filter(([id, doc]) => doc.tenant_id === tenantId)
    .reduce((acc, [id, doc]) => ({ ...acc, [id]: doc }), {});
  
  return tenantDocs;
}

Advantages

Resource Efficiency

Single collection for all tenants
Shared indexes and resources
Lower operational overhead
Cost-effective for large numbers of small tenants

Simplified Management

Single schema to maintain
Easier to implement cross-tenant analytics
Simplified backup and restore processes

Scalability

Supports unlimited tenants
No collection limits to worry about
Easier to onboard new tenants

Security Considerations

Critical: Always filter by tenant_id in every query to prevent data leaks between tenants.

// ✅ Secure: Always filter by tenant_id
async function secureQuery(tenantId, filters) {
  return await client.collection("shared_documents").query(
    filter(
      field("tenant_id").eq(tenantId)
        .and(field("published_year").gte(2000))
    ).limit(100)
  );
}

// ❌ Insecure: Missing tenant_id filter
async function insecureQuery(filters) {
  return await client.collection("shared_documents").query(
    filter(field("published_year").gte(2000)).limit(100)
  );
}

Best Practices

Implement middleware to automatically inject tenant_id filters:

class TenantAwareClient {
  constructor(client, tenantId) {
    this.client = client;
    this.tenantId = tenantId;
  }
  
  collection(name) {
    return new TenantAwareCollection(
      this.client.collection(name),
      this.tenantId
    );
  }
}

class TenantAwareCollection {
  constructor(collection, tenantId) {
    this.collection = collection;
    this.tenantId = tenantId;
  }
  
  async query(query) {
    // Automatically inject tenant_id filter
    const tenantQuery = filter(
      field("tenant_id").eq(this.tenantId)
    ).and(query);
    
    return await this.collection.query(tenantQuery);
  }
  
  async upsert(documents) {
    // Automatically add tenant_id to documents
    const tenantDocs = documents.map(doc => ({
      ...doc,
      tenant_id: this.tenantId
    }));
    
    return await this.collection.upsert(tenantDocs);
  }
}

Use consistent naming for the tenant ID field across your application.
Index the tenant_id field for optimal query performance (text fields are indexed by default).
Monitor query patterns to ensure tenant_id filters are always applied.

Hybrid Approach

Combine both strategies for optimal flexibility:

class MultiTenantManager {
  constructor(client) {
    this.client = client;
    this.largeTenantsCollection = new Map(); // Tenants with dedicated collections
    this.sharedCollection = "shared_documents"; // Small tenants share this
  }
  
  async getTenantCollection(tenantId, tenantSize) {
    // Large tenants get dedicated collections
    if (tenantSize === "large") {
      if (!this.largeTenantsCollection.has(tenantId)) {
        const collectionName = `tenant_${tenantId}_documents`;
        await this.createDedicatedCollection(collectionName);
        this.largeTenantsCollection.set(tenantId, collectionName);
      }
      return this.client.collection(this.largeTenantsCollection.get(tenantId));
    }
    
    // Small tenants use shared collection
    return new TenantAwareCollection(
      this.client.collection(this.sharedCollection),
      tenantId
    );
  }
  
  async createDedicatedCollection(collectionName) {
    await this.client.collections().create(collectionName, {
      title: text().required().index(semanticIndex()),
      content: text().required(),
      embedding: f32Vector({ dimension: 768 }).index(
        vectorIndex({ metric: "cosine" })
      )
    });
  }
}

When to Use Hybrid

Variable Tenant Sizes

Use dedicated collections for large enterprise customers and shared collections for small businesses.

Performance Tiers

Offer premium customers dedicated infrastructure while maintaining cost-effectiveness for standard tiers.

Gradual Migration

Start tenants in shared collections and promote them to dedicated collections as they grow.

Comparison Table

Aspect	Collection per Tenant	Field-Based Partitioning	Hybrid
Isolation	Strongest	Moderate (query-level)	Variable
Scalability	Limited by collection count	Unlimited tenants	Best of both
Performance	Optimal per tenant	Shared resources	Optimized per tier
Complexity	Medium	Low	High
Cost	Higher	Lower	Optimized
Best For	Enterprise customers	SaaS with many small tenants	Variable tenant sizes

Complete Example: SaaS Document Search

import { Client } from "topk-js";
import { text, int, f32Vector, vectorIndex, semanticIndex, keywordIndex } from "topk-js/schema";
import { field, filter, fn, select } from "topk-js/query";

class DocumentSearchService {
  constructor(apiKey, region) {
    this.client = new Client({ apiKey, region });
    this.collectionName = "saas_documents";
  }
  
  async initialize() {
    // Create shared multi-tenant collection
    await this.client.collections().create(this.collectionName, {
      tenant_id: text().required(),
      user_id: text().required(),
      title: text().required().index(semanticIndex()),
      content: text().required().index(keywordIndex()),
      tags: text(),
      created_at: int().required(),
      updated_at: int(),
      embedding: f32Vector({ dimension: 768 }).index(
        vectorIndex({ metric: "cosine" })
      )
    });
  }
  
  async addDocument(tenantId, userId, document) {
    const doc = {
      ...document,
      tenant_id: tenantId,
      user_id: userId,
      created_at: Date.now()
    };
    
    const lsn = await this.client.collection(this.collectionName).upsert([doc]);
    return lsn;
  }
  
  async search(tenantId, query, options = {}) {
    const { limit = 20, userId = null, tags = null } = options;
    
    // Build filter with tenant isolation
    let filterExpr = field("tenant_id").eq(tenantId);
    
    // Optionally filter by user
    if (userId) {
      filterExpr = filterExpr.and(field("user_id").eq(userId));
    }
    
    // Optionally filter by tags
    if (tags) {
      filterExpr = filterExpr.and(field("tags").contains(tags));
    }
    
    const results = await this.client.collection(this.collectionName).query(
      filter(filterExpr)
        .topk(fn.semanticSimilarity("content", query), limit)
    );
    
    return results;
  }
  
  async deleteUserDocuments(tenantId, userId) {
    const lsn = await this.client.collection(this.collectionName).delete(
      field("tenant_id").eq(tenantId).and(field("user_id").eq(userId))
    );
    return lsn;
  }
  
  async deleteTenant(tenantId) {
    const lsn = await this.client.collection(this.collectionName).delete(
      field("tenant_id").eq(tenantId)
    );
    return lsn;
  }
}

// Usage
const service = new DocumentSearchService(
  process.env.TOPK_API_KEY,
  "aws-us-east-1-elastica"
);

await service.initialize();

// Tenant A adds documents
await service.addDocument("tenant_a", "user_123", {
  _id: "doc_1",
  title: "Product Requirements",
  content: "Detailed specifications for the new feature...",
  tags: "engineering",
  embedding: await generateEmbedding("Product Requirements")
});

// Tenant A searches their documents
const results = await service.search("tenant_a", "feature specifications", {
  limit: 10,
  tags: "engineering"
});

// Results only contain tenant_a's documents

Monitoring and Observability

Implement logging and monitoring to track tenant-specific usage patterns:

Query latency per tenant
Document count per tenant
Storage usage per tenant
Failed query attempts (potential security issues)

class MonitoredTenantCollection {
  constructor(collection, tenantId, logger) {
    this.collection = collection;
    this.tenantId = tenantId;
    this.logger = logger;
  }
  
  async query(query, options) {
    const startTime = Date.now();
    
    try {
      const results = await this.collection.query(query, options);
      
      this.logger.info({
        event: "tenant_query",
        tenant_id: this.tenantId,
        duration_ms: Date.now() - startTime,
        result_count: results.length
      });
      
      return results;
    } catch (error) {
      this.logger.error({
        event: "tenant_query_error",
        tenant_id: this.tenantId,
        error: error.message
      });
      throw error;
    }
  }
}

Get Started

Core Concepts

Collections

Documents

Advanced

Multi-Tenancy Patterns

Overview

Multi-Tenancy Strategies

Collection per Tenant

Field-Based Partitioning

Hybrid Approach

Collection per Tenant

Implementation

Querying Tenant Data

Advantages

Considerations

Field-Based Partitioning

Implementation

Querying with Tenant Isolation

Tenant Data Operations

Advantages

Security Considerations

Best Practices

Hybrid Approach

When to Use Hybrid

Comparison Table

Complete Example: SaaS Document Search

Monitoring and Observability

Next Steps

Consistency Levels

Query Documentation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Collections

Documents

Advanced

​Overview

​Multi-Tenancy Strategies

Collection per Tenant

Field-Based Partitioning

Hybrid Approach

​Collection per Tenant

​Implementation

​Querying Tenant Data

​Advantages

​Considerations

​Field-Based Partitioning

​Implementation

​Querying with Tenant Isolation

​Tenant Data Operations

​Advantages

​Security Considerations

​Best Practices

​Hybrid Approach

​When to Use Hybrid

​Comparison Table

​Complete Example: SaaS Document Search

​Monitoring and Observability

​Next Steps

Consistency Levels

Query Documentation

Build docs developers (and LLMs) love

Overview

Multi-Tenancy Strategies

Collection per Tenant

Implementation

Querying Tenant Data

Advantages

Considerations

Field-Based Partitioning

Implementation

Querying with Tenant Isolation

Tenant Data Operations

Advantages

Security Considerations

Best Practices

Hybrid Approach

When to Use Hybrid

Comparison Table

Complete Example: SaaS Document Search

Monitoring and Observability

Next Steps