System architecture
SKU Semantic Search is built on a modern stack that combines vector embeddings, semantic search, and the RAG (Retrieval-Augmented Generation) pattern to provide intelligent product recommendations.Request flow
When a user submits a search query, the system processes it through the following stages:Request validation
FastAPI receives the search request and validates it using Pydantic schemas defined in The request must include a text query and optionally a limit for the number of results.
app/schemas/product_schema.py:22:Embedding generation
The user’s query is converted into a 3072-dimensional vector using Google Gemini’s embedding model (This vector is a mathematical representation of the semantic meaning of the query.
app/services/llm_service.py:46):Vector similarity search
PostgreSQL with pgvector performs a cosine distance search to find the most similar products (The
app/services/product_service.py:26):cosine_distance function efficiently finds products whose embeddings are closest to the query embedding in the 3072-dimensional space.RAG-based generation
The retrieved products are formatted as context and passed to an LLM for natural language generation (This ensures the AI only recommends products that actually exist in the database.
app/api/endpoints/products.py:14):Core components
FastAPI application
The main application is defined inapp/main.py:14:
Database model
Products are stored with their embeddings inapp/models/product.py:5:
Vector(3072) column type is provided by pgvector and enables efficient similarity searches.
LLM service
TheLLMService class (app/services/llm_service.py:11) manages AI provider interactions:
Key design decisions
Why pgvector instead of specialized vector databases?
Why pgvector instead of specialized vector databases?
pgvector provides several advantages for this use case:
- Simplicity: Uses familiar PostgreSQL, no need to learn new database systems
- ACID compliance: Full transactional support for product data
- Cost-effective: No additional infrastructure or licensing costs
- Sufficient performance: Handles thousands of products efficiently with proper indexing
Why Gemini for embeddings?
Why Gemini for embeddings?
Google’s Gemini embedding model offers:
- High dimensionality: 3072 dimensions capture nuanced semantic relationships
- Multilingual support: Works well with Spanish product descriptions
- Free tier: Generous quota for development and testing
- Quality: Strong performance on semantic similarity tasks
Why RAG instead of fine-tuning?
Why RAG instead of fine-tuning?
The RAG pattern is preferred because:
- No training required: Works immediately with new products
- Always up-to-date: Reflects the current product catalog
- Prevents hallucinations: AI can only recommend actual products
- Cost-effective: No expensive fine-tuning jobs
- Transparent: Easy to understand which products informed each recommendation
Why multi-LLM failover?
Why multi-LLM failover?
Provider resilience is critical because:
- API limits: Free tiers have rate limits and quotas
- Downtime: Even major providers have occasional outages
- Geographic restrictions: Some providers may be unavailable in certain regions
- Cost optimization: Can route to cheaper providers when primary is unavailable
Performance considerations
Embedding cache
Product embeddings are generated once during creation and stored in the database. This avoids expensive re-computation on every search.Vector indexing
For optimal performance with large catalogs, create an index on the embedding column:IVFFlat indexing trades a small amount of accuracy for much faster search times. For exact nearest neighbor search, use HNSW indexing instead.
Connection pooling
SQLAlchemy manages database connections efficiently viaapp/db/session.py:9:
get_db dependency ensures connections are properly released after each request.
Next steps
RAG pattern
Deep dive into Retrieval-Augmented Generation implementation
Multi-LLM failover
Learn how automatic provider failover works
Database setup
Configure PostgreSQL and pgvector for optimal performance
Docker deployment
Deploy the complete system with Docker Compose