Overview
Vector search finds semantically similar content but cannot filter by structured properties. Metadata filtering adds boolean predicates (equals, range, contains) to retrieve precisely targeted documents.Metadata filtering is essential for production RAG applications when you need queries like “find documents about ML published in 2024” or “retrieve technical articles by author X”.
How it works
The metadata filtering pipeline follows these steps:- Query embedding - Convert query text to vector representation
- Filtered vector search - Execute similarity search with metadata constraints
- Post-processing - Apply additional client-side filters if configured
- RAG generation - Generate answer using filtered documents (optional)
Supported operators
VectorDB supports the following filter operators across all databases:| Operator | Description | Example |
|---|---|---|
equals | Exact match | category = "electronics" |
not_equals | Not equal | status != "archived" |
gt | Greater than | price > 100 |
gte | Greater than or equal | date >= "2024-01-01" |
lt | Less than | score < 0.5 |
lte | Less than or equal | rating <= 4.5 |
in | Value in list | category in ["tech", "science"] |
not_in | Value not in list | author not in ["user1", "user2"] |
contains | Substring match (case-insensitive) | title contains "machine" |
startswith | Prefix match (case-insensitive) | name startswith "Dr" |
endswith | Suffix match (case-insensitive) | filename endswith ".pdf" |
String operators (
contains, startswith, endswith) are case-insensitive for consistent behavior across databases.Database-specific syntax
Each database uses its own native filter format:Configuration
Define metadata filters in your pipeline configuration:Usage example
Performance optimization
Selectivity analysis
Filter order matters for query performance. VectorDB includes selectivity analysis to optimize filter execution:Pre-filter vs post-filter
Databases apply filters at different stages:- Pre-filter - Filter before vector search (faster, smaller search space)
- Post-filter - Filter after vector search (preserves ranking quality)
Timing metrics
Track filter performance with built-in timing metrics:Related features
JSON indexing
Filter by nested JSON paths
Namespaces
Logical data partitioning
Multi-tenancy
Tenant-isolated retrieval
Semantic search
Vector similarity search