Skip to main content

Overview

IndexType defines the indexing algorithm used for efficient vector search. Different index types offer various trade-offs between search speed, accuracy, memory usage, and build time.
import zvec

print(zvec.IndexType.HNSW)
# Output: IndexType.HNSW

Available Index Types

UNDEFINED
IndexType
No index specified. Uses default indexing behavior.When to use: Let Zvec choose the appropriate index automatically.
# Index type not specified - uses default
field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP32,
    dim=768
)
FLAT
IndexType
Flat index (brute-force search). Exhaustive search with 100% recall.Characteristics:
  • Perfect recall (100%)
  • Linear search time O(n)
  • No index build time
  • Minimal memory overhead
When to use:
  • Small datasets (< 10K vectors)
  • When perfect recall is required
  • Baseline benchmarking
  • Development and testing
field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP32,
    dim=512,
    index_type=IndexType.FLAT
)
HNSW
IndexType
Hierarchical Navigable Small World. Graph-based approximate nearest neighbor index.Characteristics:
  • Very fast search (logarithmic)
  • High recall (> 95% typical)
  • Higher memory usage
  • Fast index build
  • Good for general-purpose use
When to use:
  • Medium to large datasets (10K - 100M+ vectors)
  • When fast queries are critical
  • General-purpose vector search
  • Most production deployments
field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP32,
    dim=768,
    index_type=IndexType.HNSW
)
IVF
IndexType
Inverted File Index. Clustering-based approximate search.Characteristics:
  • Fast search with tunable accuracy
  • Lower memory than HNSW
  • Longer index build time
  • Good for very large datasets
When to use:
  • Very large datasets (100M+ vectors)
  • Memory-constrained environments
  • When recall can be traded for speed
  • Batch processing scenarios
field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP32,
    dim=1536,
    index_type=IndexType.IVF
)
INVERT
IndexType
Inverted index. Optimized for sparse vectors (e.g., BM25).Characteristics:
  • Designed for sparse vectors
  • Fast keyword-style search
  • Memory-efficient for sparse data
  • Supports hybrid search
When to use:
  • Sparse vector fields
  • BM25 or TF-IDF embeddings
  • Keyword search
  • Hybrid dense-sparse search
field = Field(
    name="bm25_sparse",
    dtype=DataType.SPARSE_VECTOR_FP32,
    index_type=IndexType.INVERT  # For sparse vectors
)

Index Properties

All IndexType enum members have these properties:
name
str
The name of the index type as a string.
IndexType.HNSW.name  # "HNSW"
value
int
The internal integer value of the index type.
IndexType.HNSW.value  # 1

Usage Examples

Basic Index Definition

from zvec import Collection, Field, DataType, IndexType, MetricType

schema = [
    Field(name="id", dtype=DataType.STRING, is_primary=True),
    Field(name="title", dtype=DataType.STRING),
    Field(
        name="embedding",
        dtype=DataType.VECTOR_FP32,
        dim=768,
        metric=MetricType.COSINE,
        index_type=IndexType.HNSW  # Graph-based ANN
    )
]

collection = Collection.create(name="documents", schema=schema)

Multi-Field with Different Index Types

schema = [
    Field(name="id", dtype=DataType.STRING, is_primary=True),
    
    # Dense vector with HNSW
    Field(
        name="dense_embedding",
        dtype=DataType.VECTOR_FP32,
        dim=768,
        metric=MetricType.COSINE,
        index_type=IndexType.HNSW
    ),
    
    # Sparse vector with inverted index
    Field(
        name="sparse_embedding",
        dtype=DataType.SPARSE_VECTOR_FP32,
        index_type=IndexType.INVERT
    ),
    
    # Small auxiliary vector with flat index
    Field(
        name="thumbnail_vec",
        dtype=DataType.VECTOR_FP32,
        dim=64,
        metric=MetricType.L2,
        index_type=IndexType.FLAT
    )
]

collection = Collection.create(name="hybrid_search", schema=schema)

Index Comparison

Index TypeBuild TimeQuery SpeedMemoryRecall
FLATInstantSlow (linear)Minimal100%
HNSWFastVery fastHigh> 95%
IVFSlowFastMedium90-95%
INVERTFastFastLow (sparse)100%
Performance varies based on dataset size, dimensionality, and configuration.

Choosing the Right Index

1

Determine Your Dataset Size

VectorsRecommended Index
< 10KFLAT
10K - 10MHNSW
10M - 100MHNSW or IVF
> 100MIVF
SparseINVERT
2

Consider Your Requirements

Choose based on priorities:Speed priority: HNSW > IVF > FLATMemory priority: IVF > HNSW > FLATAccuracy priority: FLAT > HNSW > IVFBuild time priority: FLAT > HNSW > IVF
3

Test with Your Data

Benchmark different index types:
import time

for index_type in [IndexType.FLAT, IndexType.HNSW, IndexType.IVF]:
    collection = create_test_collection(index_type)
    
    # Measure index build time
    start = time.time()
    collection.insert_many(test_vectors)
    build_time = time.time() - start
    
    # Measure query time
    start = time.time()
    results = collection.query(query_vector, topn=10)
    query_time = time.time() - start
    
    # Evaluate recall
    recall = compute_recall(results, ground_truth)
    
    print(f"{index_type.name}:")
    print(f"  Build: {build_time:.2f}s")
    print(f"  Query: {query_time*1000:.2f}ms")
    print(f"  Recall: {recall:.2%}")

Index Configuration

Each index type can be tuned with additional parameters:

HNSW Parameters

from zvec.model.param import HNSWParam

hnsw_param = HNSWParam(
    m=16,              # Number of connections per node
    ef_construction=200, # Construction-time search depth
    ef_search=64       # Query-time search depth
)

field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP32,
    dim=768,
    index_type=IndexType.HNSW,
    index_param=hnsw_param
)

IVF Parameters

from zvec.model.param import IVFParam

ivf_param = IVFParam(
    nlist=1024,  # Number of clusters
    nprobe=32    # Number of clusters to search
)

field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP32,
    dim=1536,
    index_type=IndexType.IVF,
    index_param=ivf_param
)

Hybrid Search with Multiple Indexes

from zvec import Collection, Field, DataType, IndexType, MetricType

# Schema with dense and sparse vectors
schema = [
    Field(name="id", dtype=DataType.STRING, is_primary=True),
    Field(name="content", dtype=DataType.STRING),
    
    # Dense semantic vector (HNSW)
    Field(
        name="dense_vec",
        dtype=DataType.VECTOR_FP32,
        dim=768,
        metric=MetricType.COSINE,
        index_type=IndexType.HNSW
    ),
    
    # Sparse keyword vector (INVERT)
    Field(
        name="sparse_vec",
        dtype=DataType.SPARSE_VECTOR_FP32,
        index_type=IndexType.INVERT
    )
]

collection = Collection.create(name="hybrid_docs", schema=schema)

# Query both dense and sparse vectors
results = collection.query(
    vectors={
        "dense_vec": dense_embedding,
        "sparse_vec": sparse_embedding
    },
    topn=10
)

Best Practices

General Recommendations:
  1. Start with HNSW for most applications (< 100M vectors)
  2. Use FLAT for small datasets or benchmarking
  3. Switch to IVF when memory becomes a constraint
  4. Use INVERT for sparse vectors (BM25, TF-IDF)
  5. Combine with quantization for large-scale deployments
  6. Monitor recall to ensure quality meets requirements
Common Mistakes:
  • Using FLAT for large datasets (> 100K vectors)
  • Using HNSW without enough memory
  • Not tuning IVF parameters for your dataset
  • Using dense indexes for sparse vectors
  • Ignoring recall metrics in production

Performance Tips

  • Increase m for better recall (use 16-64)
  • Increase ef_construction for better index quality (use 100-500)
  • Increase ef_search for better query recall (use 32-128)
  • Monitor memory usage as these increase index size
  • Set nlist = sqrt(num_vectors) as starting point
  • Increase nprobe for better recall (1-50)
  • Use quantization (INT8, INT4) to reduce memory
  • Pre-train clusters on representative data
  • Use only for small datasets (< 10K)
  • Consider GPU acceleration for larger FLAT searches
  • Use as baseline to measure ANN index quality

See Also

Build docs developers (and LLMs) love