Skip to main content

Overview

DataType is an enumeration that defines all supported data types in Zvec, including scalar types, dense/sparse vector types, and array types.
import zvec

print(zvec.DataType.VECTOR_FP32)
# Output: DataType.VECTOR_FP32

print(zvec.DataType.FLOAT)
# Output: DataType.FLOAT

Scalar Types

Basic data types for single values.
STRING
DataType
String/text data type. Stores text values of variable length.
field = Field(name="title", dtype=DataType.STRING)
BOOL
DataType
Boolean data type. Stores True or False values.
field = Field(name="is_active", dtype=DataType.BOOL)
INT32
DataType
32-bit signed integer (-2,147,483,648 to 2,147,483,647).
field = Field(name="count", dtype=DataType.INT32)
INT64
DataType
64-bit signed integer (-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807).
field = Field(name="timestamp", dtype=DataType.INT64)
UINT32
DataType
32-bit unsigned integer (0 to 4,294,967,295).
field = Field(name="id", dtype=DataType.UINT32)
UINT64
DataType
64-bit unsigned integer (0 to 18,446,744,073,709,551,615).
field = Field(name="large_id", dtype=DataType.UINT64)
FLOAT
DataType
32-bit floating point number (single precision).
field = Field(name="score", dtype=DataType.FLOAT)
DOUBLE
DataType
64-bit floating point number (double precision).
field = Field(name="price", dtype=DataType.DOUBLE)

Dense Vector Types

Fixed-dimensional dense vectors for embeddings.
VECTOR_FP16
DataType
Dense vector with 16-bit floating point elements (half precision). More memory-efficient than FP32 with slight precision loss.
field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP16,
    dim=768
)
VECTOR_FP32
DataType
Dense vector with 32-bit floating point elements (single precision). Most common vector type for embeddings.
field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP32,
    dim=1536
)
VECTOR_FP64
DataType
Dense vector with 64-bit floating point elements (double precision). Highest precision but uses more memory.
field = Field(
    name="embedding",
    dtype=DataType.VECTOR_FP64,
    dim=512
)
VECTOR_INT8
DataType
Dense vector with 8-bit signed integer elements. Used for quantized embeddings.
field = Field(
    name="quantized_embedding",
    dtype=DataType.VECTOR_INT8,
    dim=384
)

Sparse Vector Types

Sparse vectors for high-dimensional spaces where most elements are zero.
SPARSE_VECTOR_FP16
DataType
Sparse vector with 16-bit floating point values. Stores only non-zero elements.
field = Field(
    name="sparse_embedding",
    dtype=DataType.SPARSE_VECTOR_FP16
)
SPARSE_VECTOR_FP32
DataType
Sparse vector with 32-bit floating point values. Most common sparse vector type.
field = Field(
    name="bm25_vector",
    dtype=DataType.SPARSE_VECTOR_FP32
)

Array Types

Variable-length arrays of scalar values.
ARRAY_STRING
DataType
Array of strings. Stores multiple text values.
field = Field(name="tags", dtype=DataType.ARRAY_STRING)
# Example: ["python", "tutorial", "beginner"]
ARRAY_BOOL
DataType
Array of boolean values.
field = Field(name="flags", dtype=DataType.ARRAY_BOOL)
# Example: [True, False, True]
ARRAY_INT32
DataType
Array of 32-bit signed integers.
field = Field(name="ratings", dtype=DataType.ARRAY_INT32)
# Example: [5, 4, 3, 5]
ARRAY_INT64
DataType
Array of 64-bit signed integers.
field = Field(name="timestamps", dtype=DataType.ARRAY_INT64)
ARRAY_UINT32
DataType
Array of 32-bit unsigned integers.
field = Field(name="ids", dtype=DataType.ARRAY_UINT32)
ARRAY_UINT64
DataType
Array of 64-bit unsigned integers.
field = Field(name="large_ids", dtype=DataType.ARRAY_UINT64)
ARRAY_FLOAT
DataType
Array of 32-bit floating point numbers.
field = Field(name="scores", dtype=DataType.ARRAY_FLOAT)
# Example: [0.95, 0.87, 0.92]
ARRAY_DOUBLE
DataType
Array of 64-bit floating point numbers.
field = Field(name="coordinates", dtype=DataType.ARRAY_DOUBLE)
# Example: [40.7128, -74.0060]

Usage Examples

Defining Schema with Data Types

from zvec import Collection, Field, DataType

schema = [
    Field(name="id", dtype=DataType.STRING, is_primary=True),
    Field(name="title", dtype=DataType.STRING),
    Field(name="views", dtype=DataType.INT64),
    Field(name="rating", dtype=DataType.FLOAT),
    Field(name="is_published", dtype=DataType.BOOL),
    Field(name="tags", dtype=DataType.ARRAY_STRING),
    Field(
        name="title_embedding",
        dtype=DataType.VECTOR_FP32,
        dim=768
    ),
    Field(
        name="content_embedding",
        dtype=DataType.VECTOR_FP16,
        dim=1536
    ),
    Field(
        name="bm25_sparse",
        dtype=DataType.SPARSE_VECTOR_FP32
    )
]

collection = Collection.create(
    name="articles",
    schema=schema
)

Checking Data Type

import zvec

field = Field(name="vec", dtype=DataType.VECTOR_FP32, dim=384)

print(field.dtype)  # DataType.VECTOR_FP32
print(field.dtype.name)  # "VECTOR_FP32"
print(field.dtype.value)  # 23

if field.dtype == DataType.VECTOR_FP32:
    print("This is a 32-bit float vector")

Vector Type Comparison

from zvec import DataType

# Memory usage comparison for 1536-dimensional vector
vector_types = [
    (DataType.VECTOR_FP64, 1536 * 8),   # 12,288 bytes
    (DataType.VECTOR_FP32, 1536 * 4),   # 6,144 bytes
    (DataType.VECTOR_FP16, 1536 * 2),   # 3,072 bytes
    (DataType.VECTOR_INT8, 1536 * 1),   # 1,536 bytes
]

for dtype, bytes_per_vec in vector_types:
    print(f"{dtype.name}: {bytes_per_vec:,} bytes per vector")

Type Properties

All DataType enum members have these properties:
name
str
The name of the data type as a string.
DataType.VECTOR_FP32.name  # "VECTOR_FP32"
value
int
The internal integer value of the data type.
DataType.VECTOR_FP32.value  # 23

Choosing the Right Data Type

For Vectors

Vector Type Selection:
  • VECTOR_FP32: Default choice, balanced precision and performance
  • VECTOR_FP16: 50% memory savings, slight accuracy loss
  • VECTOR_INT8: 75% memory savings, requires quantization
  • VECTOR_FP64: Maximum precision, rarely needed
  • SPARSE_VECTOR_FP32: High-dimensional sparse data (e.g., BM25)

For Scalars

Scalar Type Selection:
  • STRING: Text, IDs, names
  • INT64: Timestamps, large counts
  • INT32: Counts, small integers
  • FLOAT: Scores, ratings, percentages
  • DOUBLE: High-precision measurements
  • BOOL: Flags, binary states

For Arrays

Array Type Selection:
  • ARRAY_STRING: Tags, categories, keywords
  • ARRAY_INT32/INT64: Multiple IDs, lists of counts
  • ARRAY_FLOAT/DOUBLE: Multiple scores, coordinates

See Also

Build docs developers (and LLMs) love