A collection is the primary container for storing and querying documents in Zvec. Each collection has a fixed schema that defines its structure, including scalar fields and vector fields.
What is a Collection?
A collection in Zvec is similar to a table in traditional databases. It:
Stores documents with a consistent schema
Persists data to disk for durability
Supports CRUD operations (Create, Read, Update, Delete)
Enables vector similarity search with optional filtering
Manages indexes for efficient querying
Collection Lifecycle
Creating a New Collection
Use create_and_open() to create a new collection with a defined schema:
import zvec
from zvec import CollectionSchema, FieldSchema, VectorSchema, DataType
# Initialize Zvec
zvec.init()
# Define the schema
schema = CollectionSchema(
name = "my_collection" ,
fields = [
FieldSchema( "id" , DataType. INT64 , nullable = False ),
FieldSchema( "title" , DataType. STRING , nullable = False ),
FieldSchema( "category" , DataType. STRING , nullable = True )
],
vectors = VectorSchema(
name = "embedding" ,
data_type = DataType. VECTOR_FP32 ,
dimension = 768
)
)
# Create and open the collection
collection = zvec.create_and_open(
path = "./data/my_collection" ,
schema = schema
)
Opening an Existing Collection
Use open() to access a previously created collection:
import zvec
zvec.init()
# Open existing collection
collection = zvec.open( "./data/my_collection" )
# Access collection properties
print ( f "Collection path: { collection.path } " )
print ( f "Collection schema: { collection.schema } " )
print ( f "Document count: { collection.stats.doc_count } " )
The collection must have been previously created with create_and_open(). Opening a non-existent collection will raise an error.
Collection Properties
The Collection class exposes several read-only properties:
Property Type Description pathstrFilesystem path where collection data is stored schemaCollectionSchemaThe schema defining the collection structure statsCollectionStatsRuntime statistics (document count, size, etc.) optionCollectionOptionConfiguration options used to open the collection
Core Collection Operations
Data Manipulation (DML)
Collections support standard CRUD operations:
from zvec import Doc
# Insert new documents
doc = Doc(
id = "doc1" ,
fields = { "title" : "Introduction to Vectors" , "category" : "tutorial" },
vectors = { "embedding" : [ 0.1 , 0.2 , 0.3 , ... ]} # 768-dim vector
)
status = collection.insert(doc)
# Insert multiple documents
docs = [doc1, doc2, doc3]
statuses = collection.insert(docs)
# Update existing documents
updated_doc = Doc( id = "doc1" , fields = { "category" : "guide" })
collection.update(updated_doc)
# Upsert (insert or update)
collection.upsert(doc)
# Delete by ID
collection.delete( "doc1" )
collection.delete([ "doc2" , "doc3" ])
# Delete by filter expression
collection.delete_by_filter( "category == 'outdated'" )
Data Retrieval (DQL)
# Fetch documents by ID
docs = collection.fetch([ "doc1" , "doc2" ])
for doc_id, doc in docs.items():
print ( f "ID: { doc_id } , Title: { doc.field( 'title' ) } " )
# Vector similarity search
from zvec import VectorQuery
results = collection.query(
vectors = VectorQuery(
field_name = "embedding" ,
vector = [ 0.1 , 0.2 , 0.3 , ... ]
),
topk = 10 ,
filter = "category == 'tutorial'" ,
output_fields = [ "title" , "category" ]
)
for doc in results:
print ( f "Score: { doc.score } , Title: { doc.field( 'title' ) } " )
Schema Modification (DDL)
Collections support dynamic schema changes:
# Add a new column
new_field = FieldSchema( "author" , DataType. STRING , nullable = True )
collection.add_column(new_field, expression = "'Unknown'" )
# Drop a column
collection.drop_column( "category" )
# Rename a column
collection.alter_column( old_name = "id" , new_name = "doc_id" )
# Create an index
from zvec import HnswIndexParam
collection.create_index(
field_name = "embedding" ,
index_param = HnswIndexParam( m = 16 , ef_construction = 200 )
)
# Drop an index
collection.drop_index( "embedding" )
Schema modification operations like alter_column() only support numeric scalar fields (INT32, INT64, UINT32, UINT64, FLOAT, DOUBLE).
Persistence and Durability
Flushing Data
By default, Zvec buffers writes in memory for performance. Use flush() to ensure data is persisted to disk:
# Insert documents
collection.insert(docs)
# Force write to disk
collection.flush()
Periodically optimize the collection to merge segments and rebuild indexes:
from zvec import OptimizeOption
collection.optimize( option = OptimizeOption())
Destroying a Collection
This operation is irreversible and will permanently delete all data.
# Permanently delete the collection
collection.destroy()
Multi-Collection Workflows
You can work with multiple collections simultaneously:
import zvec
zvec.init()
# Open multiple collections
products = zvec.open( "./data/products" )
reviews = zvec.open( "./data/reviews" )
users = zvec.open( "./data/users" )
# Perform operations on each
product_results = products.query( vectors = VectorQuery( ... ))
review_results = reviews.query( vectors = VectorQuery( ... ))
Best Practices
Design your schema carefully
Collection schemas are fixed at creation time. Plan your fields and vector dimensions in advance. Use nullable=True for fields that may not always have values.
Batch operations when possible
Inserting multiple documents at once is more efficient than individual inserts: # Good: batch insert
collection.insert([doc1, doc2, doc3, ... ])
# Less efficient: individual inserts
collection.insert(doc1)
collection.insert(doc2)
collection.insert(doc3)
Flush periodically for durability
If your application requires strong durability guarantees, call flush() after critical writes. However, excessive flushing can impact performance.
Create indexes before large-scale queries
Build appropriate indexes (HNSW, IVF) on vector fields before running similarity searches at scale. See Indexing for details.
Next Steps
Schemas Learn how to define collection schemas with fields and vectors
Vectors Understand dense and sparse vector types
Indexing Optimize search performance with indexes
Querying Execute vector similarity searches with filters