Vector Search in Azure AI Search
Vector search is an information retrieval approach that uses numeric representations of content for semantic similarity matching. Unlike keyword search, vector search finds conceptually similar content even without exact text matches.What is Vector Search?
Vector search enables matching based on:- Semantic similarity: “dog” and “canine” are conceptually similar but linguistically distinct
- Multilingual content: “dog” in English and “hund” in German
- Multimodal content: Text descriptions and images of dogs
Key Concepts
Embeddings
Numeric representations of content generated by machine learning models
Vector Space
Multi-dimensional space where semantically similar items are close together
Similarity Metrics
Mathematical functions to measure distance between vectors (cosine, euclidean)
Nearest Neighbors
Algorithm to find the k most similar vectors to a query vector
How Vector Search Works
Indexing Flow
- Generate embeddings: Use embedding models (Azure OpenAI, etc.) to convert text/images to vectors
- Create vector index: Store vectors in search index with HNSW or exhaustive KNN algorithm
- Store metadata: Keep human-readable fields alongside vectors
Query Flow
- Vectorize query: Convert search query to vector using same embedding model
- Similarity search: Find nearest neighbors in vector space
- Return results: Retrieve top k most similar documents
Embedding Models
Azure OpenAI
Popular Models
| Model | Dimensions | Use Case |
|---|---|---|
| text-embedding-ada-002 | 1536 | General purpose text |
| text-embedding-3-small | 512-1536 | Efficient text embedding |
| text-embedding-3-large | 256-3072 | High quality text embedding |
| CLIP | 512 | Multimodal (text + images) |
Vector Index Configuration
HNSW Algorithm
m: Bi-directional link count (4-10)efConstruction: Neighbors during indexing (100-1000)efSearch: Neighbors during search (100-1000)metric: Similarity function (cosine, euclidean, dotProduct)
Exhaustive KNN
- Maximum accuracy required
- Small dataset (< 1M vectors)
- Accuracy more important than speed
Vector Query Example
vector: Query embedding (must match field dimensions)fields: Vector field(s) to searchk: Number of nearest neighbors to return
Compression and Optimization
Scalar Quantization
Reduce vector size by compressing float values:- 75% size reduction
- Faster search
- Lower storage costs
- Minimal accuracy loss with rescoring
Binary Quantization
Compress to 1-bit values:- 96% size reduction
- Fastest search
- Lowest storage costs
- Higher accuracy loss (mitigated by rescoring)
Similarity Metrics
Cosine Similarity
Measures angle between vectors (default for Azure OpenAI):Euclidean Distance
Measures straight-line distance:Dot Product
Measures vector alignment and magnitude:Integrated Vectorization
Automate embedding generation during indexing:Query-Time Vectorization
- No manual embedding generation
- Consistent model usage
- Simplified implementation
Use Cases
Semantic Search
Semantic Search
Find conceptually similar content regardless of exact keywords:
- “affordable car” matches “inexpensive vehicle”
- “laptop repair” matches “computer maintenance”
Multilingual Search
Multilingual Search
Search across languages without translation:
- English query finds German documents
- Single vector space for all languages
Multimodal Search
Multimodal Search
Query images with text or text with images:
- “red sports car” finds car images
- Image query finds similar product photos
Recommendation Systems
Recommendation Systems
Find similar items based on embeddings:
- “Customers who liked this also viewed…”
- Content-based filtering
Performance Considerations
Index Size
- Vectors require significant storage
- 1M documents × 1536 dimensions × 4 bytes = 6 GB
- Use compression to reduce by 75-96%
Query Performance
- HNSW: Approximate, fast (ms)
- Exhaustive KNN: Exact, slower (seconds for large datasets)
- Compression: Faster but requires rescoring
Best Practices
Right-Size k
Request only needed results (typically 10-50)
Use Compression
Enable quantization for large indexes
Tune HNSW
Adjust parameters for accuracy vs speed trade-off
Monitor Metrics
Track query latency and accuracy
Next Steps
Create Vector Index
Build your first vector search index
Hybrid Search
Combine vector and keyword search
Generate Embeddings
Learn to create embeddings
Query Vectors
Execute vector queries