Skip to main content
Multi-tenancy isolates customer data so each tenant only sees their own documents. VectorDB provides database-specific isolation strategies including namespaces, partitions, payload filters, and tenant-scoped collections.

Overview

In multi-tenant applications, data from different customers must be strictly isolated to prevent cross-tenant data leakage. Each tenant’s queries should only retrieve documents from their own data partition.
Improper tenant isolation can lead to serious data leakage issues. VectorDB ensures isolation through database-native mechanisms.

Isolation strategies

Each vector database uses a different approach to tenant isolation:
DatabaseIsolation StrategyScaleDescription
MilvusPartition key with filter expressionsMillions of tenantsPartition key field automatically routes data to tenant-specific partitions
WeaviateNative multi-tenancy with per-tenant shardsEnterprise-gradeEach tenant gets dedicated shard with strong isolation guarantees
PineconeNamespace-based isolation100,000+ tenantsTenant data stored in separate namespaces within shared index
QdrantPayload-based with optimized indexesTiered promotionMetadata-based filtering with automatic index promotion for large tenants
ChromaTenant and database scopingFlexibleCollection-per-tenant or database-per-tenant based on scale

Configuration

Configure multi-tenancy for your chosen database:
milvus:
  uri: "http://localhost:19530"
  collection_name: "tenant_documents"
  partition_key: "tenant_id"  # Partition key field
  isolation_strategy: "partition"

embedder:
  model: "sentence-transformers/all-MiniLM-L6-v2"

logging:
  level: "INFO"

Usage example

from vectordb.langchain.multi_tenancy.milvus import (
    MilvusMultiTenancyIndexingPipeline,
)
from langchain_core.documents import Document

pipeline = MilvusMultiTenancyIndexingPipeline(config, tenant_id="acme_corp")

# Documents automatically isolated to tenant
documents = [
    Document(
        page_content="Acme Corp product documentation",
        metadata={"source": "docs", "category": "products"}
    ),
    Document(
        page_content="Internal company policies",
        metadata={"source": "internal", "category": "hr"}
    )
]

# Index to tenant partition
result = pipeline.index(documents)
print(f"Indexed {result['count']} documents for tenant: acme_corp")

Tenant management

List tenants

from vectordb.langchain.multi_tenancy import MilvusMultiTenancyPipeline

pipeline = MilvusMultiTenancyPipeline(config)

# List all active tenants
tenants = pipeline.list_tenants()
print(f"Active tenants: {tenants}")
# Output: ['acme_corp', 'globex_inc', 'initech_llc']

Delete tenant

# Delete all data for a tenant
success = pipeline.delete_tenant("acme_corp")

if success:
    print("Tenant data deleted successfully")
else:
    print("Tenant not found or deletion failed")

Framework support

Multi-tenancy pipelines are available for both frameworks:
from vectordb.langchain.multi_tenancy.pinecone import (
    PineconeMultiTenancyIndexingPipeline,
    PineconeMultiTenancySearchPipeline,
)

# Create tenant-scoped pipelines
index_pipeline = PineconeMultiTenancyIndexingPipeline(
    config=config,
    tenant_id="customer_123"
)

search_pipeline = PineconeMultiTenancySearchPipeline(
    config=config,
    tenant_id="customer_123"
)

# All operations automatically scoped to tenant
index_pipeline.index(documents)
results = search_pipeline.search(query, top_k=10)

Isolation guarantees

Mechanism: Partition key field with automatic routingGuarantees:
  • Data physically separated into tenant-specific partitions
  • Filter expressions automatically scoped to partition
  • Zero cross-tenant data leakage
  • Optimized for millions of tenants
Best for: Applications with 10,000+ tenants requiring strict isolation
Mechanism: Per-tenant shards with collection-level isolationGuarantees:
  • Each tenant gets dedicated shard(s)
  • Physical isolation at storage layer
  • Independent tenant lifecycle management
  • Enterprise-grade security boundaries
Best for: SaaS applications with strict compliance requirements
Mechanism: Namespace-based logical partitioningGuarantees:
  • Logical isolation within shared index
  • Query-time namespace filtering
  • Supports 100,000+ namespaces per index
  • Automatic scaling and management
Best for: Multi-tenant SaaS with managed infrastructure preference
Mechanism: Metadata-based filtering with index optimizationGuarantees:
  • Tenant ID in payload metadata
  • Automatic index creation for tenant field
  • Tiered approach: small tenants share collection, large tenants promoted
  • Efficient filtering with minimal overhead
Best for: Flexible multi-tenant architectures with mixed tenant sizes
Mechanism: Collection-per-tenant or database-per-tenantGuarantees:
  • Complete physical isolation
  • Independent collection management
  • Flexible deployment strategies
  • Simple access control
Best for: Development, prototyping, small-scale deployments

Security best practices

1

Validate tenant IDs

Always validate and sanitize tenant IDs before passing to pipelines. Prevent injection attacks by using allow-lists or UUID validation.
import uuid

def validate_tenant_id(tenant_id: str) -> bool:
    try:
        uuid.UUID(tenant_id)
        return True
    except ValueError:
        return False

if validate_tenant_id(tenant_id):
    pipeline = MilvusMultiTenancySearchPipeline(config, tenant_id)
2

Enforce at application layer

Never rely solely on database isolation. Validate tenant access at the application layer before executing queries.
def search_for_user(user_id: str, tenant_id: str, query: str):
    # Verify user belongs to tenant
    if not user_belongs_to_tenant(user_id, tenant_id):
        raise PermissionError("Access denied")
    
    # Then execute search
    pipeline = MilvusMultiTenancySearchPipeline(config, tenant_id)
    return pipeline.search(query)
3

Audit tenant operations

Log all tenant-scoped operations for security auditing and compliance.
import logging

logger = logging.getLogger(__name__)

result = pipeline.search(query, top_k=10)
logger.info(
    f"Tenant search: tenant={tenant_id}, query={query}, "
    f"results={len(result['documents'])}"
)
4

Test isolation boundaries

Write integration tests that verify cross-tenant data leakage prevention.
# Test: tenant A cannot see tenant B's data
pipeline_a = MilvusMultiTenancySearchPipeline(config, "tenant_a")
pipeline_b = MilvusMultiTenancySearchPipeline(config, "tenant_b")

# Index to tenant B
pipeline_b.index([Document(page_content="Secret data")])

# Search from tenant A
results = pipeline_a.search("Secret data")
assert len(results["documents"]) == 0  # No cross-tenant leakage

Performance at scale

Benchmarks by tenant count

Tenant CountMilvusWeaviatePineconeQdrantChroma
100⚡ Fast⚡ Fast⚡ Fast⚡ Fast⚡ Fast
1,000⚡ Fast⚡ Fast⚡ Fast⚡ Fast🔶 Moderate
10,000⚡ Fast⚡ Fast⚡ Fast⚡ Fast⚠️ Slow
100,000+⚡ Fast⚡ Fast⚡ Fast🔶 Moderate❌ N/A
1,000,000+⚡ Fast🔶 Moderate⚡ Fast🔶 Moderate❌ N/A
Choose Milvus or Pinecone for applications expecting 10,000+ tenants. Use Weaviate for enterprise compliance needs. Qdrant and Chroma work well for smaller deployments.

Cost optimization

Resource sharing strategies

# Multiple tenants share single index
# Best for: High tenant count, similar workloads

pipeline = PineconeMultiTenancySearchPipeline(
    config=config,
    tenant_id=tenant_id  # Namespace isolation
)

# Cost: Single index cost / N tenants

Migration between strategies

Migrate from shared to dedicated isolation as tenants grow:
def migrate_tenant_to_dedicated(tenant_id: str):
    """Migrate tenant from shared to dedicated index."""
    
    # 1. Create dedicated index
    dedicated_config = config.copy()
    dedicated_config["pinecone"]["index_name"] = f"tenant-{tenant_id}"
    
    # 2. Export from shared index
    shared_pipeline = PineconeMultiTenancySearchPipeline(config, tenant_id)
    documents = shared_pipeline.export_tenant_data()
    
    # 3. Import to dedicated index
    dedicated_pipeline = PineconeMultiTenancyIndexingPipeline(
        dedicated_config, tenant_id
    )
    dedicated_pipeline.index(documents)
    
    # 4. Delete from shared index
    shared_pipeline.delete_tenant(tenant_id)
    
    print(f"Migrated {len(documents)} documents to dedicated index")

Namespaces

Logical data partitioning

Metadata filtering

Structured constraints

Cost-optimized RAG

Efficient production pipelines

Agentic RAG

Multi-step retrieval loops

Build docs developers (and LLMs) love