Skip to main content
Zvec provides extensive configuration options to tune performance for your specific workload. This guide covers memory management, thread configuration, index tuning, and query optimization.

Initialization Configuration

Configure Zvec before any operations:
import zvec
from zvec import LogLevel, LogType

zvec.init(
    # Resource limits
    memory_limit_mb=4096,        # 4GB memory limit
    query_threads=8,             # Parallel query threads
    optimize_threads=4,          # Background optimization threads
    
    # Query heuristics
    invert_to_forward_scan_ratio=0.9,
    brute_force_by_keys_ratio=0.1,
    
    # Logging
    log_type=LogType.FILE,
    log_level=LogLevel.WARN,
    log_dir="./logs"
)
zvec.init() must be called before creating or opening collections, and can only be called once per process.

Memory Configuration

Memory Limit

Set a soft memory cap to prevent OOM errors:
zvec.init(
    memory_limit_mb=2048  # 2GB limit
)
Guidelines:
Collection SizeRecommended Memory
< 100K vectors512 MB
100K - 1M vectors1-2 GB
1M - 10M vectors4-8 GB
10M+ vectors16+ GB
1

Estimate Memory Needs

Calculate approximate memory requirements:
# Formula:
# memory_mb = (num_docs * vector_dim * 4 bytes) / (1024 * 1024)
#             + (num_docs * scalar_fields * 8 bytes) / (1024 * 1024)
#             + index_overhead

num_docs = 1_000_000
vector_dim = 768
scalar_fields = 5

# Vector data
vector_mb = (num_docs * vector_dim * 4) / (1024 * 1024)

# Scalar data
scalar_mb = (num_docs * scalar_fields * 8) / (1024 * 1024)

# HNSW index overhead (approximately 30-50%)
index_overhead = (vector_mb + scalar_mb) * 0.4

total_mb = vector_mb + scalar_mb + index_overhead
print(f"Estimated memory: {total_mb:.0f} MB")
# Estimated memory: 4096 MB

# Add 20% buffer
recommended = total_mb * 1.2
zvec.init(memory_limit_mb=int(recommended))
2

Monitor Memory Usage

stats = collection.stats
print(f"Documents: {stats.doc_count}")
# Check system memory usage externally
3

Set Container Limits

In Docker/Kubernetes, Zvec auto-detects cgroup limits:
# docker-compose.yml
services:
  zvec-app:
    image: my-zvec-app
    deploy:
      resources:
        limits:
          memory: 4G  # Zvec detects this automatically
Or set explicitly:
# Override auto-detection
zvec.init(memory_limit_mb=3200)  # 80% of 4GB
If memory_limit_mb=None, Zvec uses 80% of cgroup memory limit in containers, or uses available system memory otherwise.

Thread Configuration

Query Threads

Control parallelism for query operations:
zvec.init(
    query_threads=8  # Use 8 threads for queries
)
Guidelines:
  • Default (None): Auto-detects available CPU cores
  • CPU-bound workloads: Set to number of physical cores
  • Mixed workloads: Set to 2x physical cores
  • Memory-constrained: Reduce threads to lower memory usage
import os

# Get CPU count
cpu_count = os.cpu_count()

# Conservative: physical cores
zvec.init(query_threads=cpu_count // 2)

# Aggressive: use all logical cores
zvec.init(query_threads=cpu_count)

Optimize Threads

Background threads for indexing and compaction:
zvec.init(
    optimize_threads=4  # 4 background threads
)
Guidelines:
  • Heavy inserts: Increase optimize threads (4-8)
  • Query-heavy: Reduce optimize threads (1-2)
  • Default: Same as query_threads
1

Balanced Configuration

# General purpose
zvec.init(
    query_threads=8,
    optimize_threads=4
)
2

Insert-Heavy Workload

# Optimize for bulk inserts
zvec.init(
    query_threads=4,      # Fewer query threads
    optimize_threads=8    # More background indexing
)
3

Query-Heavy Workload

# Optimize for queries
zvec.init(
    query_threads=16,     # Max query parallelism
    optimize_threads=2    # Minimal background work
)

Index Tuning

HNSW Parameters

HNSW is the default index type, balancing speed and accuracy:
from zvec import VectorSchema, DataType, HnswIndexParam

vector_field = VectorSchema(
    name="embedding",
    data_type=DataType.VECTOR_FP32,
    dimension=768,
    index_param=HnswIndexParam(
        ef_construction=200,  # Build quality
        m=16                  # Connectivity
    )
)

Parameter Guide

ef_construction (default: 100)
  • Controls index build quality
  • Higher = better recall, slower build, more memory
  • Range: 100-400
# Fast build, good recall
HnswIndexParam(ef_construction=100)

# Balanced (recommended)
HnswIndexParam(ef_construction=200)

# High quality, slow build
HnswIndexParam(ef_construction=400)
m (default: 16)
  • Number of connections per node
  • Higher = better recall, more memory
  • Range: 8-32
# Memory-efficient
HnswIndexParam(m=8)

# Balanced (recommended)
HnswIndexParam(m=16)

# High recall
HnswIndexParam(m=32)
Use Caseef_constructionmTrade-off
Development1008Fast, lower quality
Production (balanced)20016Good balance
High accuracy40032Best recall, slow
Memory-constrained1008Minimal memory

IVF Parameters

For very large collections (millions of vectors):
from zvec import IVFIndexParam

vector_field = VectorSchema(
    name="embedding",
    data_type=DataType.VECTOR_FP32,
    dimension=768,
    index_param=IVFIndexParam(
        nlist=1000  # Number of clusters
    )
)
nlist Guidelines:
# Formula: nlist = sqrt(num_vectors)
import math

num_vectors = 1_000_000
nlist = int(math.sqrt(num_vectors))  # 1000

IVFIndexParam(nlist=nlist)
Collection Sizenlist
100K vectors316
1M vectors1000
10M vectors3162

Flat Index

Exact search, no approximation:
from zvec import FlatIndexParam

# Guaranteed exact results
vector_field = VectorSchema(
    name="embedding",
    data_type=DataType.VECTOR_FP32,
    dimension=768,
    index_param=FlatIndexParam()
)
Use Flat index for:
  • Small collections (< 10K vectors)
  • When exact recall is critical
  • Benchmarking other index types

Query Optimization

Query-Time Parameters

HNSW Query Parameter

from zvec import VectorQuery, HnswQueryParam

results = collection.query(
    VectorQuery(
        field_name="embedding",
        vector=query_vector,
        param=HnswQueryParam(
            ef=200  # Search quality
        )
    ),
    topk=10
)
ef (search parameter):
  • Controls search quality vs speed
  • Must be >= topk
  • Default: 50
# Fast, lower recall
HnswQueryParam(ef=50)

# Balanced
HnswQueryParam(ef=200)

# High recall, slower
HnswQueryParam(ef=500)

IVF Query Parameter

from zvec import IVFQueryParam

results = collection.query(
    VectorQuery(
        field_name="embedding",
        vector=query_vector,
        param=IVFQueryParam(
            nprobe=10  # Clusters to search
        )
    ),
    topk=10
)
nprobe:
  • Number of clusters to search
  • Higher = better recall, slower
  • Default: 10
# Fast, lower recall
IVFQueryParam(nprobe=5)

# Balanced
IVFQueryParam(nprobe=10)

# High recall
IVFQueryParam(nprobe=50)

Filter Optimization

Optimize filtered queries:
1

Enable Range Optimization

For numeric range filters:
from zvec import FieldSchema, InvertIndexParam

FieldSchema(
    "timestamp",
    DataType.INT64,
    index_param=InvertIndexParam(
        enable_range_optimization=True  # Speed up range queries
    )
)
2

Use Selective Filters

# Good: Filters 90% of documents (selective)
filter="category = 'rare_item'"

# Avoid: Matches 90% of documents (non-selective)
filter="category != 'rare_item'"
3

Combine Filters Efficiently

# Good: Most selective filter first
filter="rare_category = 'X' and price > 100"

# Order doesn't matter, but clarity helps

Query Heuristics

Advanced tuning for query execution:
zvec.init(
    invert_to_forward_scan_ratio=0.9,
    brute_force_by_keys_ratio=0.1
)

invert_to_forward_scan_ratio

Threshold to switch from inverted index to full scan:
  • Range: 0.0 - 1.0
  • Higher = more aggressive index skipping
  • Default: 0.9
# Conservative: use index more often
zvec.init(invert_to_forward_scan_ratio=0.7)

# Aggressive: skip index for non-selective filters
zvec.init(invert_to_forward_scan_ratio=0.95)

brute_force_by_keys_ratio

Threshold to use brute-force key lookup:
  • Range: 0.0 - 1.0
  • Lower = prefer index
  • Default: 0.1
# Prefer index
zvec.init(brute_force_by_keys_ratio=0.05)

# Prefer brute force for small result sets
zvec.init(brute_force_by_keys_ratio=0.2)
Default values work well for most cases. Only adjust if profiling shows specific bottlenecks.

Batch Operations

Optimize bulk inserts and queries:

Batch Insert

from zvec import Doc

# Optimal batch size: 100-1000 documents
batch_size = 500

for i in range(0, len(all_docs), batch_size):
    batch = all_docs[i:i + batch_size]
    results = collection.insert(batch)
    
    # Check results
    assert all(r.ok() for r in results)

print(f"Inserted {len(all_docs)} documents")

Optimize After Bulk Insert

from zvec import OptimizeOption

# Insert large batch
collection.insert(large_batch)

# Trigger optimization
collection.optimize(option=OptimizeOption())

# Check index completeness
stats = collection.stats
print(f"Index completeness: {stats.index_completeness}")

Monitoring and Profiling

Collection Statistics

stats = collection.stats

print(f"Document count: {stats.doc_count}")
print(f"Index completeness: {stats.index_completeness}")
# {'dense': 1.0, 'sparse': 1.0}

Query Performance

import time

start = time.time()
results = collection.query(
    VectorQuery(field_name="embedding", vector=query_vec),
    topk=10
)
query_time = time.time() - start

print(f"Query time: {query_time*1000:.2f}ms")
print(f"Results: {len(results)}")

Logging Configuration

from zvec import LogType, LogLevel

zvec.init(
    log_type=LogType.FILE,
    log_level=LogLevel.DEBUG,  # DEBUG, INFO, WARN, ERROR, FATAL
    log_dir="./logs",
    log_basename="zvec.log",
    log_file_size=2048,        # MB per file
    log_overdue_days=7         # Retention period
)

Common Performance Patterns

High-Throughput Ingest

# Configuration for bulk loading
zvec.init(
    memory_limit_mb=8192,
    query_threads=4,
    optimize_threads=8  # More background threads
)

# Use large batches
batch_size = 1000

# Optimize periodically
for i in range(0, len(all_docs), batch_size * 10):
    # Insert 10 batches
    for j in range(10):
        batch = all_docs[i + j*batch_size : i + (j+1)*batch_size]
        collection.insert(batch)
    
    # Optimize every 10k docs
    collection.optimize()

Low-Latency Queries

# Configuration for fast queries
zvec.init(
    memory_limit_mb=16384,  # Large memory
    query_threads=16,       # Max parallelism
    optimize_threads=2
)

# Use aggressive HNSW build
HnswIndexParam(
    ef_construction=400,
    m=32
)

# Moderate query parameters
HnswQueryParam(ef=100)

Memory-Constrained

# Configuration for limited memory
zvec.init(
    memory_limit_mb=1024,
    query_threads=2,
    optimize_threads=1
)

# Use minimal HNSW parameters
HnswIndexParam(
    ef_construction=100,
    m=8
)

# Small batch sizes
batch_size = 100

Troubleshooting

Slow Queries

1

Check Index Parameters

# Query time parameter too high?
HnswQueryParam(ef=50)  # Instead of 500
2

Verify Index Built

stats = collection.stats
print(stats.index_completeness)
# Should be 1.0 for all fields
3

Optimize Collection

collection.optimize()

High Memory Usage

1

Reduce Index Parameters

# Lower m and ef_construction
HnswIndexParam(ef_construction=100, m=8)
2

Use INT8 Quantization

VectorSchema("emb", DataType.VECTOR_INT8, dimension=768)
# 4x memory savings
3

Limit Thread Count

zvec.init(
    query_threads=4,
    optimize_threads=2
)

Next Steps

Build docs developers (and LLMs) love