Skip to main content

Vector Search in Azure AI Search

Vector search is an information retrieval approach that uses numeric representations of content for semantic similarity matching. Unlike keyword search, vector search finds conceptually similar content even without exact text matches. Vector search enables matching based on:
  • Semantic similarity: “dog” and “canine” are conceptually similar but linguistically distinct
  • Multilingual content: “dog” in English and “hund” in German
  • Multimodal content: Text descriptions and images of dogs

Key Concepts

Embeddings

Numeric representations of content generated by machine learning models

Vector Space

Multi-dimensional space where semantically similar items are close together

Similarity Metrics

Mathematical functions to measure distance between vectors (cosine, euclidean)

Nearest Neighbors

Algorithm to find the k most similar vectors to a query vector

How Vector Search Works

Indexing Flow

  1. Generate embeddings: Use embedding models (Azure OpenAI, etc.) to convert text/images to vectors
  2. Create vector index: Store vectors in search index with HNSW or exhaustive KNN algorithm
  3. Store metadata: Keep human-readable fields alongside vectors

Query Flow

  1. Vectorize query: Convert search query to vector using same embedding model
  2. Similarity search: Find nearest neighbors in vector space
  3. Return results: Retrieve top k most similar documents

Embedding Models

Azure OpenAI

{
  "input": "what azure services support generative AI",
  "model": "text-embedding-ada-002"
}
Response: 1,536-dimension vector
ModelDimensionsUse Case
text-embedding-ada-0021536General purpose text
text-embedding-3-small512-1536Efficient text embedding
text-embedding-3-large256-3072High quality text embedding
CLIP512Multimodal (text + images)

Vector Index Configuration

HNSW Algorithm

{
  "vectorSearch": {
    "algorithms": [
      {
        "name": "my-hnsw-config",
        "kind": "hnsw",
        "hnswParameters": {
          "m": 4,
          "efConstruction": 400,
          "efSearch": 500,
          "metric": "cosine"
        }
      }
    ],
    "profiles": [
      {
        "name": "my-vector-profile",
        "algorithm": "my-hnsw-config"
      }
    ]
  }
}
Parameters:
  • m: Bi-directional link count (4-10)
  • efConstruction: Neighbors during indexing (100-1000)
  • efSearch: Neighbors during search (100-1000)
  • metric: Similarity function (cosine, euclidean, dotProduct)

Exhaustive KNN

{
  "algorithms": [
    {
      "name": "my-eknn-config",
      "kind": "exhaustiveKnn",
      "exhaustiveKnnParameters": {
        "metric": "cosine"
      }
    }
  ]
}
Use when:
  • Maximum accuracy required
  • Small dataset (< 1M vectors)
  • Accuracy more important than speed

Vector Query Example

{
  "vectorQueries": [
    {
      "kind": "vector",
      "vector": [
        -0.009154141,
        0.018708462,
        // ... 1536 dimensions
        -0.00086512347
      ],
      "fields": "contentVector",
      "k": 50
    }
  ],
  "select": "title, content, category"
}
Parameters:
  • vector: Query embedding (must match field dimensions)
  • fields: Vector field(s) to search
  • k: Number of nearest neighbors to return

Compression and Optimization

Scalar Quantization

Reduce vector size by compressing float values:
{
  "compressions": [
    {
      "name": "scalar-quantization",
      "kind": "scalarQuantization",
      "scalarQuantizationParameters": {
        "quantizedDataType": "int8"
      },
      "rescoringOptions": {
        "enableRescoring": true,
        "defaultOversampling": 10
      }
    }
  ]
}
Benefits:
  • 75% size reduction
  • Faster search
  • Lower storage costs
  • Minimal accuracy loss with rescoring

Binary Quantization

Compress to 1-bit values:
{
  "compressions": [
    {
      "name": "binary-quantization",
      "kind": "binaryQuantization",
      "rescoringOptions": {
        "enableRescoring": true,
        "defaultOversampling": 20
      }
    }
  ]
}
Benefits:
  • 96% size reduction
  • Fastest search
  • Lowest storage costs
  • Higher accuracy loss (mitigated by rescoring)

Similarity Metrics

Cosine Similarity

Measures angle between vectors (default for Azure OpenAI):
similarity = (A · B) / (||A|| ||B||)
Range: -1 to 1 (higher is more similar)

Euclidean Distance

Measures straight-line distance:
distance = sqrt(Σ(Ai - Bi)²)
Range: 0 to ∞ (lower is more similar)

Dot Product

Measures vector alignment and magnitude:
similarity = Σ(Ai × Bi)
Range: -∞ to ∞ (higher is more similar)

Integrated Vectorization

Automate embedding generation during indexing:
{
  "fields": [
    {
      "name": "contentVector",
      "type": "Collection(Edm.Single)",
      "searchable": true,
      "dimensions": 1536,
      "vectorSearchProfile": "my-vector-profile"
    }
  ],
  "vectorizers": [
    {
      "name": "my-openai-vectorizer",
      "kind": "azureOpenAI",
      "azureOpenAIParameters": {
        "resourceUri": "https://my-openai.openai.azure.com",
        "deploymentId": "text-embedding-ada-002",
        "apiKey": "..."
      }
    }
  ]
}

Query-Time Vectorization

{
  "vectorQueries": [
    {
      "kind": "text",
      "text": "luxury hotel with ocean view",
      "fields": "descriptionVector",
      "k": 50
    }
  ]
}
Benefits:
  • No manual embedding generation
  • Consistent model usage
  • Simplified implementation

Use Cases

Find similar items based on embeddings:
  • “Customers who liked this also viewed…”
  • Content-based filtering

Performance Considerations

Index Size

  • Vectors require significant storage
  • 1M documents × 1536 dimensions × 4 bytes = 6 GB
  • Use compression to reduce by 75-96%

Query Performance

  • HNSW: Approximate, fast (ms)
  • Exhaustive KNN: Exact, slower (seconds for large datasets)
  • Compression: Faster but requires rescoring

Best Practices

Right-Size k

Request only needed results (typically 10-50)

Use Compression

Enable quantization for large indexes

Tune HNSW

Adjust parameters for accuracy vs speed trade-off

Monitor Metrics

Track query latency and accuracy

Next Steps

Create Vector Index

Build your first vector search index

Hybrid Search

Combine vector and keyword search

Generate Embeddings

Learn to create embeddings

Query Vectors

Execute vector queries

Build docs developers (and LLMs) love