Skip to main content
A schema defines the structure of a collection in Zvec. Every collection has a fixed schema that specifies:
  • The collection name
  • Scalar fields (e.g., ID, title, timestamp)
  • Vector fields for similarity search
  • Data types, dimensions, and constraints

Schema Components

Zvec schemas consist of three main classes:
  • CollectionSchema: Top-level container for the entire schema
  • FieldSchema: Defines scalar (non-vector) fields
  • VectorSchema: Defines vector fields for embeddings

CollectionSchema

The CollectionSchema class defines the overall structure of a collection:
from zvec import CollectionSchema, FieldSchema, VectorSchema, DataType

schema = CollectionSchema(
    name="my_collection",
    fields=[field1, field2, ...],    # Scalar fields
    vectors=[vector1, vector2, ...]   # Vector fields
)

Parameters

ParameterTypeDescription
namestrName of the collection (required)
fieldsFieldSchema or list[FieldSchema]One or more scalar field definitions
vectorsVectorSchema or list[VectorSchema]One or more vector field definitions
Field names must be unique across both scalar and vector fields. Duplicate names will raise a ValueError.

Accessing Schema Information

# Get collection name
print(schema.name)  # "my_collection"

# List all scalar fields
for field in schema.fields:
    print(f"{field.name}: {field.data_type}")

# List all vector fields
for vector in schema.vectors:
    print(f"{vector.name}: {vector.dimension}D {vector.data_type}")

# Retrieve specific field by name
id_field = schema.field("id")
if id_field:
    print(f"ID field type: {id_field.data_type}")

# Retrieve specific vector by name
emb_field = schema.vector("embedding")
if emb_field:
    print(f"Embedding dimension: {emb_field.dimension}")

FieldSchema

The FieldSchema class defines scalar (non-vector) fields:
from zvec import FieldSchema, DataType, InvertIndexParam

# Simple field
id_field = FieldSchema(
    name="id",
    data_type=DataType.INT64,
    nullable=False
)

# Field with inverted index
category_field = FieldSchema(
    name="category",
    data_type=DataType.STRING,
    nullable=True,
    index_param=InvertIndexParam(enable_range_optimization=True)
)

Parameters

ParameterTypeDefaultDescription
namestr-Field name (must be unique)
data_typeDataType-Data type (see below)
nullableboolFalseWhether field can contain null values
index_paramInvertIndexParamNoneInverted index configuration for filtering

Supported Scalar Data Types

Zvec supports the following scalar data types:

Numeric Types

DataType.INT32      # 32-bit signed integer
DataType.INT64      # 64-bit signed integer
DataType.UINT32     # 32-bit unsigned integer
DataType.UINT64     # 64-bit unsigned integer
DataType.FLOAT      # 32-bit floating point
DataType.DOUBLE     # 64-bit floating point

String and Boolean

DataType.STRING     # UTF-8 string
DataType.BOOL       # Boolean (true/false)

Array Types

DataType.ARRAY_INT32      # Array of 32-bit integers
DataType.ARRAY_INT64      # Array of 64-bit integers
DataType.ARRAY_UINT32     # Array of unsigned 32-bit integers
DataType.ARRAY_UINT64     # Array of unsigned 64-bit integers
DataType.ARRAY_FLOAT      # Array of floats
DataType.ARRAY_DOUBLE     # Array of doubles
DataType.ARRAY_STRING     # Array of strings
DataType.ARRAY_BOOL       # Array of booleans

Example: Multiple Scalar Fields

from zvec import FieldSchema, DataType

fields = [
    FieldSchema("id", DataType.INT64, nullable=False),
    FieldSchema("title", DataType.STRING, nullable=False),
    FieldSchema("timestamp", DataType.INT64, nullable=False),
    FieldSchema("price", DataType.FLOAT, nullable=True),
    FieldSchema("tags", DataType.ARRAY_STRING, nullable=True),
    FieldSchema("views", DataType.INT32, nullable=False)
]

VectorSchema

The VectorSchema class defines vector fields for similarity search:
from zvec import VectorSchema, DataType, HnswIndexParam

# Dense vector with HNSW index
embedding = VectorSchema(
    name="embedding",
    data_type=DataType.VECTOR_FP32,
    dimension=768,
    index_param=HnswIndexParam(m=16, ef_construction=200)
)

# Sparse vector with default index
sparse_embedding = VectorSchema(
    name="sparse_embedding",
    data_type=DataType.SPARSE_VECTOR_FP32,
    dimension=0,  # Dimension not required for sparse vectors
    index_param=FlatIndexParam()
)

Parameters

ParameterTypeDefaultDescription
namestr-Vector field name (must be unique)
data_typeDataType-Vector data type (see below)
dimensionint0Vector dimensionality (must be > 0 for dense vectors)
index_paramHnswIndexParam, IVFIndexParam, FlatIndexParamFlatIndexParam()Index configuration

Supported Vector Data Types

Dense Vectors

DataType.VECTOR_FP16    # 16-bit float (half precision)
DataType.VECTOR_FP32    # 32-bit float (single precision)
DataType.VECTOR_FP64    # 64-bit float (double precision)
DataType.VECTOR_INT8    # 8-bit integer (quantized)

Sparse Vectors

DataType.SPARSE_VECTOR_FP16    # Sparse 16-bit float
DataType.SPARSE_VECTOR_FP32    # Sparse 32-bit float
Dense vectors are stored as fixed-length arrays. Sparse vectors are stored as dictionaries mapping indices to values (see Vectors).

Example: Multiple Vector Fields

from zvec import VectorSchema, DataType, HnswIndexParam, FlatIndexParam

vectors = [
    # Text embedding
    VectorSchema(
        name="text_embedding",
        data_type=DataType.VECTOR_FP32,
        dimension=768,
        index_param=HnswIndexParam(m=16, ef_construction=200)
    ),
    # Image embedding
    VectorSchema(
        name="image_embedding",
        data_type=DataType.VECTOR_FP32,
        dimension=512,
        index_param=HnswIndexParam(m=16, ef_construction=200)
    ),
    # Sparse keyword embedding
    VectorSchema(
        name="keyword_embedding",
        data_type=DataType.SPARSE_VECTOR_FP32,
        dimension=0,
        index_param=FlatIndexParam()
    )
]

Complete Schema Example

Here’s a complete example combining all schema components:
import zvec
from zvec import (
    CollectionSchema,
    FieldSchema,
    VectorSchema,
    DataType,
    HnswIndexParam,
    InvertIndexParam
)

# Initialize Zvec
zvec.init()

# Define scalar fields
fields = [
    FieldSchema(
        name="id",
        data_type=DataType.INT64,
        nullable=False
    ),
    FieldSchema(
        name="title",
        data_type=DataType.STRING,
        nullable=False
    ),
    FieldSchema(
        name="category",
        data_type=DataType.STRING,
        nullable=True,
        index_param=InvertIndexParam(enable_range_optimization=True)
    ),
    FieldSchema(
        name="price",
        data_type=DataType.FLOAT,
        nullable=True
    ),
    FieldSchema(
        name="tags",
        data_type=DataType.ARRAY_STRING,
        nullable=True
    )
]

# Define vector fields
vectors = [
    VectorSchema(
        name="text_embedding",
        data_type=DataType.VECTOR_FP32,
        dimension=768,
        index_param=HnswIndexParam(m=16, ef_construction=200)
    ),
    VectorSchema(
        name="sparse_embedding",
        data_type=DataType.SPARSE_VECTOR_FP32,
        dimension=0
    )
]

# Create collection schema
schema = CollectionSchema(
    name="product_catalog",
    fields=fields,
    vectors=vectors
)

# Create collection
collection = zvec.create_and_open(
    path="./data/product_catalog",
    schema=schema
)

print(f"Created collection: {collection.schema.name}")
print(f"Scalar fields: {[f.name for f in collection.schema.fields]}")
print(f"Vector fields: {[v.name for v in collection.schema.vectors]}")

Schema Validation

Zvec performs automatic validation when creating schemas:
Field names must be unique across both scalar and vector fields:
# This will raise ValueError
schema = CollectionSchema(
    name="test",
    fields=FieldSchema("embedding", DataType.STRING),
    vectors=VectorSchema("embedding", DataType.VECTOR_FP32, dimension=128)
)
# Error: duplicate field name 'embedding'
Only supported data types are allowed:
# FieldSchema only accepts scalar types
field = FieldSchema("vec", DataType.VECTOR_FP32)  # ValueError

# VectorSchema only accepts vector types
vector = VectorSchema("id", DataType.INT64, dimension=128)  # ValueError
Dense vectors require positive dimensions:
# This will raise ValueError
vector = VectorSchema("emb", DataType.VECTOR_FP32, dimension=0)
# Error: dimension must be > 0 for dense vectors

# Sparse vectors can have dimension=0
sparse = VectorSchema("sparse", DataType.SPARSE_VECTOR_FP32, dimension=0)  # OK

Schema Best Practices

1

Plan your schema in advance

Collection schemas are fixed at creation time. Choose appropriate data types and dimensions before creating the collection.
2

Use nullable fields for optional data

If a field may not always have a value, set nullable=True to avoid insertion errors.
3

Choose appropriate vector data types

  • VECTOR_FP32: Most common, good balance of precision and performance
  • VECTOR_FP16: Reduced memory usage, slightly lower precision
  • VECTOR_INT8: Quantized vectors for extreme memory efficiency
  • SPARSE_VECTOR_FP32: For high-dimensional sparse data (e.g., BM25)
4

Add inverted indexes to filtered fields

If you plan to filter by a field frequently, add an inverted index:
FieldSchema(
    name="category",
    data_type=DataType.STRING,
    index_param=InvertIndexParam()
)
5

Configure vector indexes at schema creation

Set index parameters during schema definition to avoid rebuilding indexes later:
VectorSchema(
    name="embedding",
    data_type=DataType.VECTOR_FP32,
    dimension=768,
    index_param=HnswIndexParam(m=16, ef_construction=200)
)

Next Steps

Vectors

Learn about dense and sparse vector types

Indexing

Understand index types and performance tuning

Collections

Work with collections and data operations

Querying

Execute vector similarity searches

Build docs developers (and LLMs) love