Skip to main content

Performance Tuning Guide

VecLabs uses HNSW (Hierarchical Navigable Small World) graphs for fast approximate nearest neighbor search. This guide covers tuning parameters for optimal speed and accuracy.

HNSW Overview

HNSW is the gold standard for vector search:

Fast Queries

Sub-millisecond queries for millions of vectorsO(log n) search complexity

High Recall

95-99% recall with proper tuningBetter than locality-sensitive hashing

Memory Efficient

Compact graph structureMuch smaller than exhaustive search

Production-Ready

Used by Spotify, Weaviate, QdrantBattle-tested at scale

How HNSW Works

HNSW builds a multi-layer graph:
Layer 2:  [Entry] ---------> [Node]
            |
            v
Layer 1:  [A] -----> [B] -----> [C]
          |  \       / |         |
          |   \     /  |         |
Layer 0:  [1]--[2]--[3]--[4]--[5]--[6]--[7]
1

Start at entry point

Search begins at the top layer (sparsest)
2

Greedy descent

Move to nearest neighbor until local minimum reached
3

Drop to next layer

Descend to denser layer, repeat search
4

Base layer search

At layer 0 (densest), use beam search with ef_search parameter
5

Return top-K

Return K best candidates found

Key Parameters

1. ef_search (Query-Time)

What it does: Controls beam width during search at the base layer Effect:
  • Higher = better recall, slower queries
  • Lower = faster queries, worse recall
Default: 50 Typical range: 10-500
import { SolVec } from 'solvec';

const sv = new SolVec({ network: 'devnet' });
const col = sv.collection('vectors', { dimensions: 1536 });

// The SDK uses HNSW internally with ef_search=50 by default
// For production use, tune based on your requirements
ef_searchUse CaseRecallSpeed
10-20Ultra-fast search, recall not critical~85%Fastest
30-50Balanced (default)~95%Fast
100-200High accuracy required~98%Medium
300-500Maximum accuracy~99%+Slower
For production, start with ef_search=50. Increase if you need better recall, decrease if speed is critical.

2. M (Max Connections)

What it does: Maximum edges per node in the graph Effect:
  • Higher = better recall, more memory, slower indexing
  • Lower = less memory, faster indexing, worse recall
Default: 16 Typical range: 8-64
// Rust core (TypeScript SDK uses sensible defaults)
let index = HNSWIndex::new(
    16,  // M: max connections per node
    200, // ef_construction
    DistanceMetric::Cosine
);

Tuning M

MMemory UsageRecallIndex Time
8Low~92%Fast
16Medium (default)~95%Medium
32High~97%Slow
64Very High~99%Very Slow
Changing M requires rebuilding the index. Choose wisely during initial setup.

3. ef_construction (Index-Time)

What it does: Beam width during index construction Effect:
  • Higher = better graph quality, slower indexing
  • Lower = faster indexing, worse recall
Default: 200 Typical range: 100-400
let index = HNSWIndex::new(
    16,
    200, // ef_construction: higher = better quality
    DistanceMetric::Cosine
);

Tuning ef_construction

ef_constructionIndex TimeGraph QualityRecommended For
100-150FastGoodDevelopment, frequent updates
200-300Medium (default)ExcellentProduction
400-500SlowMaximumStatic datasets
ef_construction only affects indexing speed, not query speed. Higher is almost always better for production.

Performance Benchmarks

Query Speed vs Collection Size

Vectorsef_search=50ef_search=100ef_search=200
1K0.5ms0.8ms1.2ms
10K2ms3ms5ms
100K8ms12ms20ms
1M25ms40ms70ms
10M80ms150ms280ms
Benchmarks measured on M1 MacBook Pro with 1536-dimensional vectors and cosine similarity.
Recall %
100% |                    _______________
 99% |                ___/
 98% |            ___/
 97% |        ___/
 96% |     __/
 95% |   _/
 94% | _/
 93% |/
     +-------------------------------------> ef_search
     10   20   50   100  200  300  500
Sweet spot: ef_search = 50-100 (95-98% recall)

Optimizing for Different Use Cases

Use Case 1: Real-Time Search (Speed Priority)

Goal: Sub-10ms queries, 90%+ recall acceptable
let mut index = HNSWIndex::new(
    12,  // Lower M for speed
    150, // Lower ef_construction (faster indexing)
    DistanceMetric::Cosine
);

index.set_ef_search(30); // Aggressive speed optimization
Expected:
  • Query time: 1-5ms for 100K vectors
  • Recall: ~92%
  • Memory: Low

Use Case 2: Balanced (Default)

Goal: Fast queries with high accuracy
let mut index = HNSWIndex::default_cosine();
// M=16, ef_construction=200, ef_search=50
Expected:
  • Query time: 5-15ms for 100K vectors
  • Recall: ~95%
  • Memory: Medium

Use Case 3: Maximum Accuracy

Goal: 99%+ recall, speed secondary
let mut index = HNSWIndex::new(
    32,  // Higher M for accuracy
    400, // High ef_construction
    DistanceMetric::Cosine
);

index.set_ef_search(200);
Expected:
  • Query time: 20-50ms for 100K vectors
  • Recall: ~99%
  • Memory: High

Use Case 4: Massive Scale (10M+ Vectors)

Goal: Scale to millions of vectors
let mut index = HNSWIndex::new(
    16,  // Standard M
    200, // Standard ef_construction
    DistanceMetric::Cosine
);

// Start low, increase if needed
index.set_ef_search(50);
Strategies:
  • Partition data into multiple collections
  • Use coarse-to-fine search (search summary vectors first)
  • Increase ef_search only for critical queries
  • Consider dimensionality reduction (1536 → 768)

Batch Operations

Batch Upsert

Always batch upserts for better performance:
// ❌ BAD: 1000 individual upserts
for (const vec of vectors) {
  await col.upsert([vec]); // Slow!
}

// ✅ GOOD: Single batch upsert
await col.upsert(vectors); // Fast!
Performance difference:
  • Individual: ~10ms per vector (1000 vectors = 10 seconds)
  • Batch: ~5ms total (1000 vectors = 5ms)
Reason: Avoids repeated Merkle root recomputation and on-chain updates.

Optimal Batch Size

// Split large datasets into reasonable batches
const BATCH_SIZE = 1000;

for (let i = 0; i < allVectors.length; i += BATCH_SIZE) {
  const batch = allVectors.slice(i, i + BATCH_SIZE);
  await col.upsert(batch);
  console.log(`Upserted ${i + batch.length} / ${allVectors.length}`);
}
Recommended batch sizes:
  • Small vectors (< 512 dim): 5000-10000
  • Medium vectors (512-1536 dim): 1000-5000
  • Large vectors (> 1536 dim): 500-1000

Memory Optimization

Dimensionality Reduction

Reduce vector dimensions for faster queries and less memory:
from sklearn.decomposition import PCA
import numpy as np

# Original OpenAI embeddings: 1536 dimensions
embeddings = np.array([...])  # shape: (n, 1536)

# Reduce to 768 dimensions
pca = PCA(n_components=768)
reduced = pca.fit_transform(embeddings)  # shape: (n, 768)

# Upsert reduced vectors
col = sv.collection("reduced", dimensions=768)
col.upsert([{"id": f"vec_{i}", "values": vec.tolist()} for i, vec in enumerate(reduced)])
Memory savings:
  • 1536D: ~6 KB per vector
  • 768D: ~3 KB per vector (50% reduction)
  • 384D: ~1.5 KB per vector (75% reduction)
Accuracy impact:
  • 1536D → 768D: ~1-2% recall loss
  • 1536D → 384D: ~3-5% recall loss
For most semantic search use cases, 768 dimensions is sufficient and provides 2x memory savings.

Metadata Optimization

// ❌ BAD: Large metadata
await col.upsert([{
  id: 'doc_1',
  values: [...],
  metadata: {
    fullText: longDocument,     // 10 KB
    rawData: JSON.stringify(obj), // 5 KB
    imageBase64: base64Image    // 100 KB
  }
}]);

// ✅ GOOD: Minimal metadata
await col.upsert([{
  id: 'doc_1',
  values: [...],
  metadata: {
    title: 'Document Title',
    url: 'https://...',
    docId: 'doc_1'
  }
}]);

// Store full content elsewhere (S3, Shadow Drive, etc.)
Best practices:
  • Store only what you need for ranking/filtering
  • Keep metadata < 1 KB per vector
  • Store large content in external storage
  • Use IDs to reference external data

Distance Metric Performance

MetricSpeedUse Case
Dot ProductFastestPre-normalized vectors
CosineFastGeneral purpose (auto-normalizes)
EuclideanFastImage/audio embeddings
// Dot product is fastest but requires normalization
function normalize(vec: number[]): number[] {
  const norm = Math.sqrt(vec.reduce((sum, v) => sum + v * v, 0));
  return vec.map(v => v / norm);
}

const col = sv.collection('fast-search', {
  dimensions: 1536,
  metric: 'dot'  // Fastest metric
});

const normalized = normalize(embedding);
await col.upsert([{ id: 'vec_1', values: normalized }]);
Only use dot metric if your vectors are normalized to unit length. Otherwise, use cosine.

Monitoring Performance

Query Timing

const start = Date.now();
const results = await col.query({ vector: [...], topK: 10 });
const latency = Date.now() - start;

console.log(`Query took ${latency}ms`);

if (latency > 50) {
  console.warn('Query slower than expected. Consider tuning.');
}

Collection Statistics

const stats = await col.describeIndexStats();

console.log({
  vectorCount: stats.vectorCount,
  dimension: stats.dimension,
  metric: stats.metric,
  estimatedMemory: `${(stats.vectorCount * stats.dimension * 4 / 1024 / 1024).toFixed(2)} MB`
});

Troubleshooting

Problem: Slow Queries

1

Check collection size

If > 1M vectors, consider partitioning
2

Reduce ef_search

Try lowering from 50 to 30
3

Reduce dimensions

Use PCA to reduce from 1536 to 768
4

Optimize metadata

Remove large metadata fields

Problem: Low Recall

1

Increase ef_search

Try 100 or 200
2

Check data quality

Ensure vectors are properly normalized
3

Increase M

Rebuild index with M=32 (requires reindexing)
4

Verify metric

Ensure you’re using the right distance metric

Problem: High Memory Usage

1

Reduce dimensions

Use PCA dimensionality reduction
2

Minimize metadata

Store large fields externally
3

Lower M

Rebuild index with M=8 or M=12
4

Partition data

Split into multiple collections

Best Practices Summary

Batch Everything

Always batch upserts (1000+ vectors at once)Avoid individual operations

Start Conservative

Use default parameters firstTune only if needed

Monitor Performance

Track query latency and recallAdjust based on metrics

Minimize Metadata

Keep metadata < 1 KB per vectorStore large content externally

Next Steps

Verification

Learn about on-chain verification

Collections

Advanced collection management

TypeScript Guide

Complete TypeScript SDK reference

Python Guide

Complete Python SDK reference

Build docs developers (and LLMs) love