Problem: ERROR: Package 'zvec' requires a different Python: 3.9.x not in '>=3.10,<3.13'Solution: Zvec requires Python 3.10, 3.11, or 3.12. Check your Python version:
python --version
If you’re on an older version, install a compatible Python:
# Ensure vector is a list of floats, not numpy arrayimport numpy as npvector = np.random.rand(768)doc = zvec.Doc(id="1", vectors={"embedding": vector.tolist()}) # Convert to list
Verify document ID is unique:
# Document IDs must be unique within a collection# Use update() if you want to modify an existing document
Check field names and types:
# Field names must match schemadoc = zvec.Doc( id="1", vectors={"embedding": vector}, fields={"title": "Text", "count": 42} # Match schema field names)
Batch inserts are more efficient than single inserts. Insert multiple documents at once when possible.
Problem: Failed to open collection, Lock file exists, or Corrupted dataSolutions:
Check for running processes:
# Find processes using the collectionlsof /path/to/collection
Close collection properly:
# Always close or use context managercollection.close()# Or use with statementwith zvec.open("./data") as collection: # Operations here pass# Automatically closed
Remove stale lock files (if no process is running):
rm /path/to/collection/*.lock
Restore from backup:
# If data is corrupted, restore from backuprm -rf ./corrupted_collectioncp -r ./backup/collection ./recovered_collection
Only remove lock files if you’re certain no other process is using the collection.
# Consolidate segments after bulk insertscollection.optimize()
Tune HNSW ef_search (recall vs. speed tradeoff):
# Lower ef_search = faster but lower recallparams = zvec.HnswQueryParams(ef_search=50) # Default is often 100+results = collection.query( zvec.VectorQuery("embedding", vector=query_vector, params=params), topk=10)
Check index parameters (set during schema creation):
# For faster queries, reduce M or increase ef_constructionfrom zvec import HnswIndexParams, MetricTypeindex_params = HnswIndexParams( metric_type=MetricType.IP, m=16, # Reduce from default 32 for faster queries ef_construction=200)
Use appropriate metric type:
# IP (Inner Product) is fastest for normalized vectors# Normalize vectors before insertion:import numpy as npdef normalize(v): return (np.array(v) / np.linalg.norm(v)).tolist()
Profile query patterns:
import timestart = time.time()results = collection.query(...)print(f"Query took {time.time() - start:.3f}s")
# Build with Flat index, then convert to HNSWfrom zvec import FlatIndexParams# Start with Flat for fast ingestionschema = zvec.CollectionSchema( name="temp", vectors=zvec.VectorSchema( "embedding", zvec.DataType.VECTOR_FP32, 768, index_params=FlatIndexParams() ))
# Use FP16 instead of FP32 (half the memory)zvec.DataType.VECTOR_FP16# Or use quantized INT8 (1/4 the memory)zvec.DataType.VECTOR_INT8
Use IVF instead of HNSW:
from zvec import IVFIndexParams# IVF uses significantly less memoryindex_params = IVFIndexParams( metric_type=MetricType.L2, nlist=100 # Number of clusters)
Enable memory-mapped storage:
# Let OS manage memorycollection_options = zvec.CollectionOptions( use_mmap=True # Use memory-mapped files)
Reduce HNSW M parameter:
# Lower M = less memory, but slower queriesindex_params = HnswIndexParams( metric_type=MetricType.IP, m=8 # Default is often 16-32)