Overview
TrailBase integrates sqlite-vec, a vector search extension for SQLite that enables semantic search, similarity matching, and recommendation systems. Store and query high-dimensional embeddings directly in your SQLite database.What is Vector Search?
Vector search enables finding similar items based on semantic meaning rather than exact text matches. Common use cases include:- Semantic search: Find documents similar in meaning
- Recommendation engines: Find similar products, articles, or users
- Image similarity: Match similar images
- Anomaly detection: Identify outliers in data
- Content deduplication: Find near-duplicate content
Setup
sqlite-vec is included with TrailBase by default. No additional installation is required.Creating Tables with Vector Columns
Define tables with vector embedding columns:Coffee Search Example
From the coffee vector search example:Generating Embeddings
Using External APIs
Generate embeddings using OpenAI, Cohere, or other embedding APIs:Pre-computed Embeddings
For the coffee search example, embeddings are computed from numeric features:Similarity Search
Vector Distance Functions
sqlite-vec provides multiple distance metrics:- L2 (Euclidean):
vec_distance_L2(a, b)- Traditional distance - Cosine:
vec_distance_cosine(a, b)- Best for normalized embeddings - L1 (Manhattan):
vec_distance_L1(a, b)- City block distance
Basic Similarity Query
Coffee Search Query
From the coffee search example:Semantic Search API
Build a complete semantic search endpoint:Filtering with Vector Search
Combine vector similarity with traditional filters:Hybrid Search
Combine full-text search with vector similarity:Clustering and Grouping
Find clusters of similar items:Recommendation System
Performance Optimization
Indexing
For large datasets, create approximate nearest neighbor (ANN) indexes:Batch Processing
Embedding Storage
Choosing Dimensions
- 384 dimensions: sentence-transformers/all-MiniLM-L6-v2 (good balance)
- 768 dimensions: sentence-transformers/all-mpnet-base-v2 (higher quality)
- 1536 dimensions: OpenAI text-embedding-ada-002
- 3072 dimensions: OpenAI text-embedding-3-large
Storage Format
Complete Example: Document Search
See the coffee vector search example for a complete working implementation.Best Practices
Next Steps
Custom Endpoints
Build search APIs
Geospatial
Location-based queries
Jobs Scheduler
Batch embedding generation
Object Storage
Store large files