IndexType

Overview

IndexType defines the indexing algorithm used for efficient vector search. Different index types offer various trade-offs between search speed, accuracy, memory usage, and build time.

import zvec

print(zvec.IndexType.HNSW)
# Output: IndexType.HNSW

Available Index Types

UNDEFINED

IndexType

No index specified. Uses default indexing behavior.When to use: Let Zvec choose the appropriate index automatically.

# Index type not specified - uses default
field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP32,
    dim=768
)

FLAT

IndexType

Flat index (brute-force search). Exhaustive search with 100% recall.Characteristics:

Perfect recall (100%)
Linear search time O(n)
No index build time
Minimal memory overhead

When to use:

Small datasets (< 10K vectors)
When perfect recall is required
Baseline benchmarking
Development and testing

field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP32,
    dim=512,
    index_type=IndexType.FLAT
)

HNSW

IndexType

Hierarchical Navigable Small World. Graph-based approximate nearest neighbor index.Characteristics:

Very fast search (logarithmic)
High recall (> 95% typical)
Higher memory usage
Fast index build
Good for general-purpose use

When to use:

Medium to large datasets (10K - 100M+ vectors)
When fast queries are critical
General-purpose vector search
Most production deployments

field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP32,
    dim=768,
    index_type=IndexType.HNSW
)

IVF

IndexType

Inverted File Index. Clustering-based approximate search.Characteristics:

Fast search with tunable accuracy
Lower memory than HNSW
Longer index build time
Good for very large datasets

When to use:

Very large datasets (100M+ vectors)
Memory-constrained environments
When recall can be traded for speed
Batch processing scenarios

field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP32,
    dim=1536,
    index_type=IndexType.IVF
)

INVERT

IndexType

Inverted index. Optimized for sparse vectors (e.g., BM25).Characteristics:

Designed for sparse vectors
Fast keyword-style search
Memory-efficient for sparse data
Supports hybrid search

When to use:

Sparse vector fields
BM25 or TF-IDF embeddings
Keyword search
Hybrid dense-sparse search

field = Field(
    name="bm25_sparse",
    dtype=DataType.SPARSE_VECTOR_FP32,
    index_type=IndexType.INVERT  # For sparse vectors
)

Index Properties

All IndexType enum members have these properties:

name

str

The name of the index type as a string.

IndexType.HNSW.name  # "HNSW"

value

int

The internal integer value of the index type.

IndexType.HNSW.value  # 1

Usage Examples

Basic Index Definition

from zvec import Collection, Field, DataType, IndexType, MetricType

schema = [
    Field(name="id", dtype=DataType.STRING, is_primary=True),
    Field(name="title", dtype=DataType.STRING),
    Field(
        name="embedding",
        dtype=DataType.VECTOR_FP32,
        dim=768,
        metric=MetricType.COSINE,
        index_type=IndexType.HNSW  # Graph-based ANN
    )
]

collection = Collection.create(name="documents", schema=schema)

Multi-Field with Different Index Types

schema = [
    Field(name="id", dtype=DataType.STRING, is_primary=True),
    
    # Dense vector with HNSW
    Field(
        name="dense_embedding",
        dtype=DataType.VECTOR_FP32,
        dim=768,
        metric=MetricType.COSINE,
        index_type=IndexType.HNSW
    ),
    
    # Sparse vector with inverted index
    Field(
        name="sparse_embedding",
        dtype=DataType.SPARSE_VECTOR_FP32,
        index_type=IndexType.INVERT
    ),
    
    # Small auxiliary vector with flat index
    Field(
        name="thumbnail_vec",
        dtype=DataType.VECTOR_FP32,
        dim=64,
        metric=MetricType.L2,
        index_type=IndexType.FLAT
    )
]

collection = Collection.create(name="hybrid_search", schema=schema)

Index Comparison

Performance
Scalability
Trade-offs

Index Type	Build Time	Query Speed	Memory	Recall
FLAT	Instant	Slow (linear)	Minimal	100%
HNSW	Fast	Very fast	High	> 95%
IVF	Slow	Fast	Medium	90-95%
INVERT	Fast	Fast	Low (sparse)	100%

Performance varies based on dataset size, dimensionality, and configuration.

Index Type	Best For	Max Vectors	Bottleneck
FLAT	< 10K	~10K	Search time
HNSW	10K - 100M+	~100M	Memory
IVF	100M+	Billions	Build time
INVERT	Any (sparse)	Billions	Data sparsity

For massive scale (billions of vectors), combine IVF with quantization.

Choosing the Right Index

Determine Your Dataset Size

Vectors	Recommended Index
< 10K	FLAT
10K - 10M	HNSW
10M - 100M	HNSW or IVF
> 100M	IVF
Sparse	INVERT

Consider Your Requirements

Choose based on priorities:Speed priority: HNSW > IVF > FLATMemory priority: IVF > HNSW > FLATAccuracy priority: FLAT > HNSW > IVFBuild time priority: FLAT > HNSW > IVF

Test with Your Data

Benchmark different index types:

import time

for index_type in [IndexType.FLAT, IndexType.HNSW, IndexType.IVF]:
    collection = create_test_collection(index_type)
    
    # Measure index build time
    start = time.time()
    collection.insert_many(test_vectors)
    build_time = time.time() - start
    
    # Measure query time
    start = time.time()
    results = collection.query(query_vector, topn=10)
    query_time = time.time() - start
    
    # Evaluate recall
    recall = compute_recall(results, ground_truth)
    
    print(f"{index_type.name}:")
    print(f"  Build: {build_time:.2f}s")
    print(f"  Query: {query_time*1000:.2f}ms")
    print(f"  Recall: {recall:.2%}")

Index Configuration

Each index type can be tuned with additional parameters:

HNSW Parameters

from zvec.model.param import HNSWParam

hnsw_param = HNSWParam(
    m=16,              # Number of connections per node
    ef_construction=200, # Construction-time search depth
    ef_search=64       # Query-time search depth
)

field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP32,
    dim=768,
    index_type=IndexType.HNSW,
    index_param=hnsw_param
)

IVF Parameters

from zvec.model.param import IVFParam

ivf_param = IVFParam(
    nlist=1024,  # Number of clusters
    nprobe=32    # Number of clusters to search
)

field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP32,
    dim=1536,
    index_type=IndexType.IVF,
    index_param=ivf_param
)

Hybrid Search with Multiple Indexes

from zvec import Collection, Field, DataType, IndexType, MetricType

# Schema with dense and sparse vectors
schema = [
    Field(name="id", dtype=DataType.STRING, is_primary=True),
    Field(name="content", dtype=DataType.STRING),
    
    # Dense semantic vector (HNSW)
    Field(
        name="dense_vec",
        dtype=DataType.VECTOR_FP32,
        dim=768,
        metric=MetricType.COSINE,
        index_type=IndexType.HNSW
    ),
    
    # Sparse keyword vector (INVERT)
    Field(
        name="sparse_vec",
        dtype=DataType.SPARSE_VECTOR_FP32,
        index_type=IndexType.INVERT
    )
]

collection = Collection.create(name="hybrid_docs", schema=schema)

# Query both dense and sparse vectors
results = collection.query(
    vectors={
        "dense_vec": dense_embedding,
        "sparse_vec": sparse_embedding
    },
    topn=10
)

Best Practices

General Recommendations:

Start with HNSW for most applications (< 100M vectors)
Use FLAT for small datasets or benchmarking
Switch to IVF when memory becomes a constraint
Use INVERT for sparse vectors (BM25, TF-IDF)
Combine with quantization for large-scale deployments
Monitor recall to ensure quality meets requirements

Common Mistakes:

Using FLAT for large datasets (> 100K vectors)
Using HNSW without enough memory
Not tuning IVF parameters for your dataset
Using dense indexes for sparse vectors
Ignoring recall metrics in production

Performance Tips

Optimizing HNSW

Increase m for better recall (use 16-64)
Increase ef_construction for better index quality (use 100-500)
Increase ef_search for better query recall (use 32-128)
Monitor memory usage as these increase index size

Optimizing IVF

Set nlist = sqrt(num_vectors) as starting point
Increase nprobe for better recall (1-50)
Use quantization (INT8, INT4) to reduce memory
Pre-train clusters on representative data

Optimizing FLAT

Use only for small datasets (< 10K)
Consider GPU acceleration for larger FLAT searches
Use as baseline to measure ANN index quality

Initialization

Collection

Schema Types

Query Types

Index Parameters

Embedding Functions

Re-ranking

Types & Enums

Overview

Available Index Types

Index Properties

Usage Examples

Basic Index Definition

Multi-Field with Different Index Types

Index Comparison

FLAT (Exact Search)

HNSW (Graph-based ANN)

IVF (Clustering-based)

INVERT (Sparse)

Choosing the Right Index

Index Configuration

HNSW Parameters

IVF Parameters

Hybrid Search with Multiple Indexes

Best Practices

Performance Tips

See Also

Build docs developers (and LLMs) love

Initialization

Collection

Schema Types

Query Types

Index Parameters

Embedding Functions

Re-ranking

Types & Enums

​Overview

​Available Index Types

​Index Properties

​Usage Examples

​Basic Index Definition

​Multi-Field with Different Index Types

​Index Comparison

​FLAT (Exact Search)

​HNSW (Graph-based ANN)

​IVF (Clustering-based)

​INVERT (Sparse)

​Choosing the Right Index

​Index Configuration

​HNSW Parameters

​IVF Parameters

​Hybrid Search with Multiple Indexes

​Best Practices

​Performance Tips

​See Also

Build docs developers (and LLMs) love

Overview

Available Index Types

Index Properties

Usage Examples

Basic Index Definition

Multi-Field with Different Index Types

Index Comparison

FLAT (Exact Search)

HNSW (Graph-based ANN)

IVF (Clustering-based)

INVERT (Sparse)

Choosing the Right Index

Index Configuration

HNSW Parameters

IVF Parameters

Hybrid Search with Multiple Indexes

Best Practices

Performance Tips

See Also