Skip to main content

Overview

DataType is an enumeration that defines all supported data types for fields and vectors in Zvec. Choose the appropriate data type based on your data characteristics and performance requirements.

Import

from zvec import DataType
from zvec.typing import DataType  # Alternative import

Vector Data Types

Vector data types are used for similarity search fields.

Dense Vectors

VECTOR_FP32
DataType
32-bit floating point vectors. Most commonly used for dense embeddings from models like OpenAI, BERT, or Sentence Transformers.Use when: Standard precision embeddings, most ML modelsMemory: 4 bytes per dimension
VECTOR_FP16
DataType
16-bit floating point vectors. Provides 50% memory savings with minimal accuracy loss.Use when: Memory constraints, large-scale deploymentsMemory: 2 bytes per dimension
VECTOR_FP64
DataType
64-bit floating point vectors. High precision for scientific applications.Use when: Extreme precision requirementsMemory: 8 bytes per dimension
VECTOR_INT8
DataType
8-bit integer vectors. Quantized vectors for maximum memory efficiency.Use when: Extreme memory constraints, post-quantizationMemory: 1 byte per dimension

Sparse Vectors

SPARSE_VECTOR_FP32
DataType
32-bit sparse vectors. Stores only non-zero values with their indices.Use when: BM25, TF-IDF, or sparse embeddings (e.g., SPLADE)Memory: Depends on sparsity
SPARSE_VECTOR_FP16
DataType
16-bit sparse vectors. Memory-efficient sparse representation.Use when: Sparse embeddings with memory constraintsMemory: Depends on sparsity

Scalar Data Types

Scalar data types are used for metadata fields and filtering.

Integer Types

INT32
DataType
32-bit signed integer.Range: -2,147,483,648 to 2,147,483,647Use when: Counters, small IDs, enum values
INT64
DataType
64-bit signed integer.Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807Use when: Large IDs, timestamps, large counters
UINT32
DataType
32-bit unsigned integer.Range: 0 to 4,294,967,295Use when: Always-positive values, IDs
UINT64
DataType
64-bit unsigned integer.Range: 0 to 18,446,744,073,709,551,615Use when: Large always-positive values, hash values

Floating Point Types

FLOAT
DataType
32-bit floating point number.Use when: Prices, scores, measurements
DOUBLE
DataType
64-bit floating point number.Use when: High-precision calculations, scientific data

Text and Boolean

STRING
DataType
Variable-length UTF-8 string.Use when: Text fields, IDs, categories, descriptions
BOOL
DataType
Boolean value (true or false).Use when: Flags, binary states

Array Data Types

Array types store multiple values in a single field.
ARRAY_INT32
DataType
Array of 32-bit signed integers.Use when: Multiple small integer values per document
ARRAY_INT64
DataType
Array of 64-bit signed integers.Use when: Lists of large IDs, multiple timestamps
ARRAY_UINT32
DataType
Array of 32-bit unsigned integers.
ARRAY_UINT64
DataType
Array of 64-bit unsigned integers.
ARRAY_FLOAT
DataType
Array of 32-bit floating point numbers.Use when: Multiple scores, ratings, measurements
ARRAY_DOUBLE
DataType
Array of 64-bit floating point numbers.
ARRAY_STRING
DataType
Array of strings.Use when: Tags, categories, multiple text values
ARRAY_BOOL
DataType
Array of boolean values.

Examples

Vector field types

from zvec import VectorSchema, DataType

# Standard embeddings (OpenAI, BERT, etc.)
vector_fp32 = VectorSchema(
    "text_embedding",
    DataType.VECTOR_FP32,
    dimension=768
)

# Memory-efficient embeddings
vector_fp16 = VectorSchema(
    "image_embedding",
    DataType.VECTOR_FP16,
    dimension=512
)

# Sparse vectors (BM25, TF-IDF)
sparse_vector = VectorSchema(
    "bm25_embedding",
    DataType.SPARSE_VECTOR_FP32
)

Scalar field types

from zvec import FieldSchema, DataType

# Integer fields
id_field = FieldSchema("id", DataType.INT64)
count_field = FieldSchema("view_count", DataType.INT32)
timestamp_field = FieldSchema("created_at", DataType.INT64)

# Float fields
price_field = FieldSchema("price", DataType.FLOAT)
score_field = FieldSchema("relevance_score", DataType.DOUBLE)

# Text fields
title_field = FieldSchema("title", DataType.STRING)
category_field = FieldSchema("category", DataType.STRING)

# Boolean field
is_active_field = FieldSchema("is_active", DataType.BOOL)

Array field types

from zvec import FieldSchema, DataType

# String arrays
tags_field = FieldSchema("tags", DataType.ARRAY_STRING)
categories_field = FieldSchema("categories", DataType.ARRAY_STRING)

# Numeric arrays
ratings_field = FieldSchema("ratings", DataType.ARRAY_FLOAT)
related_ids_field = FieldSchema("related_ids", DataType.ARRAY_INT64)

# Boolean array
features_field = FieldSchema("features_enabled", DataType.ARRAY_BOOL)

Complete schema example

from zvec import CollectionSchema, FieldSchema, VectorSchema, DataType

schema = CollectionSchema(
    name="documents",
    fields=[
        # IDs and identifiers
        FieldSchema("doc_id", DataType.STRING),
        FieldSchema("user_id", DataType.INT64),
        
        # Metadata
        FieldSchema("title", DataType.STRING),
        FieldSchema("author", DataType.STRING, nullable=True),
        FieldSchema("created_at", DataType.INT64),
        
        # Metrics
        FieldSchema("view_count", DataType.INT32),
        FieldSchema("rating", DataType.FLOAT),
        
        # Arrays
        FieldSchema("tags", DataType.ARRAY_STRING),
        FieldSchema("related_doc_ids", DataType.ARRAY_STRING),
        
        # Flags
        FieldSchema("is_published", DataType.BOOL)
    ],
    vectors=[
        # Dense vectors
        VectorSchema("text_embedding", DataType.VECTOR_FP32, dimension=768),
        VectorSchema("image_embedding", DataType.VECTOR_FP16, dimension=512),
        
        # Sparse vectors
        VectorSchema("bm25_embedding", DataType.SPARSE_VECTOR_FP32)
    ]
)

Type Selection Guide

For Vector Embeddings

Model / Use CaseRecommended TypeNotes
OpenAI embeddingsVECTOR_FP32Standard precision
Sentence TransformersVECTOR_FP32Standard precision
CLIP image embeddingsVECTOR_FP32 or VECTOR_FP16FP16 for memory savings
BM25 / TF-IDFSPARSE_VECTOR_FP32Sparse representation
SPLADESPARSE_VECTOR_FP32Learned sparse embeddings
Memory-constrainedVECTOR_FP16 or VECTOR_INT8Trade accuracy for memory

For Metadata Fields

| Data | Recommended Type | Example | |------|------------------|---------|| | Document IDs | STRING or INT64 | “doc_123” or 123456789 | | UUIDs | STRING | “550e8400-e29b-41d4-a716-446655440000” | | Timestamps | INT64 | 1709625600 (Unix timestamp) | | Categories | STRING | “technology”, “sports” | | Tags | ARRAY_STRING | [“python”, “machine-learning”] | | Prices | FLOAT | 19.99 | | View counts | INT32 or INT64 | 1000 | | Flags | BOOL | true, false | | Ratings | FLOAT or ARRAY_FLOAT | 4.5 or [4.0, 5.0, 3.5] |
Memory vs Precision Trade-offs:
  • VECTOR_FP32VECTOR_FP16: 50% memory reduction, <1% accuracy loss
  • VECTOR_FP32VECTOR_INT8: 75% memory reduction, 1-3% accuracy loss
  • For most applications, VECTOR_FP16 provides the best balance
Sparse Vectors: Sparse vector types (SPARSE_VECTOR_FP32, SPARSE_VECTOR_FP16) automatically handle sparse representations. You don’t need to specify dimensions upfront.

Checking Data Types

from zvec import DataType

# Check if a DataType is a vector type
vector_types = [
    DataType.VECTOR_FP32,
    DataType.VECTOR_FP16,
    DataType.VECTOR_FP64,
    DataType.VECTOR_INT8,
    DataType.SPARSE_VECTOR_FP32,
    DataType.SPARSE_VECTOR_FP16
]

scalar_types = [
    DataType.INT32, DataType.INT64,
    DataType.UINT32, DataType.UINT64,
    DataType.FLOAT, DataType.DOUBLE,
    DataType.STRING, DataType.BOOL
]

# Access enum name and value
data_type = DataType.VECTOR_FP32
print(data_type.name)   # "VECTOR_FP32"
print(data_type.value)  # Numeric value

# Compare data types
if field.data_type == DataType.STRING:
    print("This is a string field")

See Also

Build docs developers (and LLMs) love