Skip to main content

Overview

Multi-tenancy is a software architecture where a single instance serves multiple customers (tenants) while keeping their data isolated. TopK provides flexible patterns for implementing multi-tenant search applications with strong data isolation and optimal performance.

Multi-Tenancy Strategies

TopK supports three primary multi-tenancy approaches, each with different trade-offs:

Collection per Tenant

Strongest isolation with separate collections for each tenant

Field-Based Partitioning

Shared collection with tenant ID filtering for better resource utilization

Hybrid Approach

Combine both strategies based on tenant size and requirements

Collection per Tenant

The collection-per-tenant pattern creates a separate collection for each tenant, providing the strongest isolation guarantees.

Implementation

import { Client } from "topk-js";
import { text, int, f32Vector, vectorIndex, semanticIndex } from "topk-js/schema";

const client = new Client({
  apiKey: "YOUR_API_KEY",
  region: "aws-us-east-1-elastica"
});

// Create a collection for a specific tenant
async function createTenantCollection(tenantId) {
  const collectionName = `tenant_${tenantId}_documents`;
  
  await client.collections().create(collectionName, {
    title: text().required().index(semanticIndex()),
    content: text().required(),
    category: text(),
    created_at: int(),
    embedding: f32Vector({ dimension: 768 }).index(
      vectorIndex({ metric: "cosine" })
    )
  });
  
  return collectionName;
}

// Get tenant-specific collection
function getTenantCollection(tenantId) {
  return client.collection(`tenant_${tenantId}_documents`);
}

// Insert tenant data
async function addDocuments(tenantId, documents) {
  const collection = getTenantCollection(tenantId);
  const lsn = await collection.upsert(documents);
  return lsn;
}

Querying Tenant Data

import { field, filter, fn } from "topk-js/query";

async function searchTenantDocuments(tenantId, queryVector, filters = {}) {
  const collection = getTenantCollection(tenantId);
  
  let query = filter(field("title").isNotNull())
    .topk(fn.vectorDistance("embedding", queryVector), 20);
  
  // Apply additional filters if provided
  if (filters.category) {
    query = filter(
      field("category").eq(filters.category)
        .and(field("title").isNotNull())
    ).topk(fn.vectorDistance("embedding", queryVector), 20);
  }
  
  return await collection.query(query);
}

Advantages

  • Complete data separation between tenants
  • Tenant data physically isolated
  • Easy to backup or restore individual tenants
  • Simplified compliance and audit trails
  • Scale storage independently per tenant
  • Optimize indexes for tenant-specific workloads
  • Different schema versions per tenant if needed
  • Delete tenant data by dropping collection
  • Migrate tenants independently
  • Apply tenant-specific optimizations

Considerations

Collection limits: Consider the number of tenants and TopK’s collection limits. This approach works best for up to thousands of tenants.
Management overhead: More collections mean more management. Use infrastructure-as-code to automate collection creation and maintenance.

Field-Based Partitioning

Field-based partitioning stores all tenant data in a shared collection with a tenant_id field, using query filters to isolate tenant data.

Implementation

import { Client } from "topk-js";
import { text, int, f32Vector, vectorIndex, semanticIndex } from "topk-js/schema";

const client = new Client({
  apiKey: "YOUR_API_KEY",
  region: "aws-us-east-1-elastica"
});

// Create a shared multi-tenant collection
await client.collections().create("shared_documents", {
  tenant_id: text().required(),  // Tenant identifier
  title: text().required().index(semanticIndex()),
  content: text().required(),
  category: text(),
  created_at: int(),
  embedding: f32Vector({ dimension: 768 }).index(
    vectorIndex({ metric: "cosine" })
  )
});

// Insert documents with tenant_id
async function addTenantDocuments(tenantId, documents) {
  // Add tenant_id to each document
  const tenantDocs = documents.map(doc => ({
    ...doc,
    tenant_id: tenantId
  }));
  
  const lsn = await client.collection("shared_documents").upsert(tenantDocs);
  return lsn;
}

Querying with Tenant Isolation

Always filter by tenant_id to ensure data isolation:
import { field, filter, fn } from "topk-js/query";

async function searchDocuments(tenantId, queryVector, limit = 20) {
  const results = await client.collection("shared_documents").query(
    filter(field("tenant_id").eq(tenantId))
      .topk(fn.vectorDistance("embedding", queryVector), limit)
  );
  
  return results;
}

Tenant Data Operations

import { field } from "topk-js/query";

// Count documents for a tenant
async function countTenantDocuments(tenantId) {
  const count = await client.collection("shared_documents").count();
  // Note: count with filter requires query
  const results = await client.collection("shared_documents").query(
    filter(field("tenant_id").eq(tenantId)).count()
  );
  return results.length;
}

// Delete all documents for a tenant
async function deleteTenantData(tenantId) {
  const lsn = await client.collection("shared_documents").delete(
    field("tenant_id").eq(tenantId)
  );
  return lsn;
}

// Get specific documents for a tenant
async function getTenantDocuments(tenantId, documentIds) {
  const allDocs = await client.collection("shared_documents").get(documentIds);
  
  // Filter to ensure tenant_id matches (security layer)
  const tenantDocs = Object.entries(allDocs)
    .filter(([id, doc]) => doc.tenant_id === tenantId)
    .reduce((acc, [id, doc]) => ({ ...acc, [id]: doc }), {});
  
  return tenantDocs;
}

Advantages

  • Single collection for all tenants
  • Shared indexes and resources
  • Lower operational overhead
  • Cost-effective for large numbers of small tenants
  • Single schema to maintain
  • Easier to implement cross-tenant analytics
  • Simplified backup and restore processes
  • Supports unlimited tenants
  • No collection limits to worry about
  • Easier to onboard new tenants

Security Considerations

Critical: Always filter by tenant_id in every query to prevent data leaks between tenants.
// ✅ Secure: Always filter by tenant_id
async function secureQuery(tenantId, filters) {
  return await client.collection("shared_documents").query(
    filter(
      field("tenant_id").eq(tenantId)
        .and(field("published_year").gte(2000))
    ).limit(100)
  );
}

// ❌ Insecure: Missing tenant_id filter
async function insecureQuery(filters) {
  return await client.collection("shared_documents").query(
    filter(field("published_year").gte(2000)).limit(100)
  );
}

Best Practices

  1. Implement middleware to automatically inject tenant_id filters:
class TenantAwareClient {
  constructor(client, tenantId) {
    this.client = client;
    this.tenantId = tenantId;
  }
  
  collection(name) {
    return new TenantAwareCollection(
      this.client.collection(name),
      this.tenantId
    );
  }
}

class TenantAwareCollection {
  constructor(collection, tenantId) {
    this.collection = collection;
    this.tenantId = tenantId;
  }
  
  async query(query) {
    // Automatically inject tenant_id filter
    const tenantQuery = filter(
      field("tenant_id").eq(this.tenantId)
    ).and(query);
    
    return await this.collection.query(tenantQuery);
  }
  
  async upsert(documents) {
    // Automatically add tenant_id to documents
    const tenantDocs = documents.map(doc => ({
      ...doc,
      tenant_id: this.tenantId
    }));
    
    return await this.collection.upsert(tenantDocs);
  }
}
  1. Use consistent naming for the tenant ID field across your application.
  2. Index the tenant_id field for optimal query performance (text fields are indexed by default).
  3. Monitor query patterns to ensure tenant_id filters are always applied.

Hybrid Approach

Combine both strategies for optimal flexibility:
class MultiTenantManager {
  constructor(client) {
    this.client = client;
    this.largeTenantsCollection = new Map(); // Tenants with dedicated collections
    this.sharedCollection = "shared_documents"; // Small tenants share this
  }
  
  async getTenantCollection(tenantId, tenantSize) {
    // Large tenants get dedicated collections
    if (tenantSize === "large") {
      if (!this.largeTenantsCollection.has(tenantId)) {
        const collectionName = `tenant_${tenantId}_documents`;
        await this.createDedicatedCollection(collectionName);
        this.largeTenantsCollection.set(tenantId, collectionName);
      }
      return this.client.collection(this.largeTenantsCollection.get(tenantId));
    }
    
    // Small tenants use shared collection
    return new TenantAwareCollection(
      this.client.collection(this.sharedCollection),
      tenantId
    );
  }
  
  async createDedicatedCollection(collectionName) {
    await this.client.collections().create(collectionName, {
      title: text().required().index(semanticIndex()),
      content: text().required(),
      embedding: f32Vector({ dimension: 768 }).index(
        vectorIndex({ metric: "cosine" })
      )
    });
  }
}

When to Use Hybrid

Use dedicated collections for large enterprise customers and shared collections for small businesses.
Offer premium customers dedicated infrastructure while maintaining cost-effectiveness for standard tiers.
Start tenants in shared collections and promote them to dedicated collections as they grow.

Comparison Table

AspectCollection per TenantField-Based PartitioningHybrid
IsolationStrongestModerate (query-level)Variable
ScalabilityLimited by collection countUnlimited tenantsBest of both
PerformanceOptimal per tenantShared resourcesOptimized per tier
ComplexityMediumLowHigh
CostHigherLowerOptimized
Best ForEnterprise customersSaaS with many small tenantsVariable tenant sizes
import { Client } from "topk-js";
import { text, int, f32Vector, vectorIndex, semanticIndex, keywordIndex } from "topk-js/schema";
import { field, filter, fn, select } from "topk-js/query";

class DocumentSearchService {
  constructor(apiKey, region) {
    this.client = new Client({ apiKey, region });
    this.collectionName = "saas_documents";
  }
  
  async initialize() {
    // Create shared multi-tenant collection
    await this.client.collections().create(this.collectionName, {
      tenant_id: text().required(),
      user_id: text().required(),
      title: text().required().index(semanticIndex()),
      content: text().required().index(keywordIndex()),
      tags: text(),
      created_at: int().required(),
      updated_at: int(),
      embedding: f32Vector({ dimension: 768 }).index(
        vectorIndex({ metric: "cosine" })
      )
    });
  }
  
  async addDocument(tenantId, userId, document) {
    const doc = {
      ...document,
      tenant_id: tenantId,
      user_id: userId,
      created_at: Date.now()
    };
    
    const lsn = await this.client.collection(this.collectionName).upsert([doc]);
    return lsn;
  }
  
  async search(tenantId, query, options = {}) {
    const { limit = 20, userId = null, tags = null } = options;
    
    // Build filter with tenant isolation
    let filterExpr = field("tenant_id").eq(tenantId);
    
    // Optionally filter by user
    if (userId) {
      filterExpr = filterExpr.and(field("user_id").eq(userId));
    }
    
    // Optionally filter by tags
    if (tags) {
      filterExpr = filterExpr.and(field("tags").contains(tags));
    }
    
    const results = await this.client.collection(this.collectionName).query(
      filter(filterExpr)
        .topk(fn.semanticSimilarity("content", query), limit)
    );
    
    return results;
  }
  
  async deleteUserDocuments(tenantId, userId) {
    const lsn = await this.client.collection(this.collectionName).delete(
      field("tenant_id").eq(tenantId).and(field("user_id").eq(userId))
    );
    return lsn;
  }
  
  async deleteTenant(tenantId) {
    const lsn = await this.client.collection(this.collectionName).delete(
      field("tenant_id").eq(tenantId)
    );
    return lsn;
  }
}

// Usage
const service = new DocumentSearchService(
  process.env.TOPK_API_KEY,
  "aws-us-east-1-elastica"
);

await service.initialize();

// Tenant A adds documents
await service.addDocument("tenant_a", "user_123", {
  _id: "doc_1",
  title: "Product Requirements",
  content: "Detailed specifications for the new feature...",
  tags: "engineering",
  embedding: await generateEmbedding("Product Requirements")
});

// Tenant A searches their documents
const results = await service.search("tenant_a", "feature specifications", {
  limit: 10,
  tags: "engineering"
});

// Results only contain tenant_a's documents

Monitoring and Observability

Implement logging and monitoring to track tenant-specific usage patterns:
  • Query latency per tenant
  • Document count per tenant
  • Storage usage per tenant
  • Failed query attempts (potential security issues)
class MonitoredTenantCollection {
  constructor(collection, tenantId, logger) {
    this.collection = collection;
    this.tenantId = tenantId;
    this.logger = logger;
  }
  
  async query(query, options) {
    const startTime = Date.now();
    
    try {
      const results = await this.collection.query(query, options);
      
      this.logger.info({
        event: "tenant_query",
        tenant_id: this.tenantId,
        duration_ms: Date.now() - startTime,
        result_count: results.length
      });
      
      return results;
    } catch (error) {
      this.logger.error({
        event: "tenant_query_error",
        tenant_id: this.tenantId,
        error: error.message
      });
      throw error;
    }
  }
}

Next Steps

Consistency Levels

Learn how to ensure data consistency in multi-tenant environments

Query Documentation

Explore advanced query patterns and filtering

Build docs developers (and LLMs) love