Overview
DataType is an enumeration that defines all supported data types for fields and vectors in Zvec. Choose the appropriate data type based on your data characteristics and performance requirements.
Import
Vector Data Types
Vector data types are used for similarity search fields.Dense Vectors
32-bit floating point vectors. Most commonly used for dense embeddings from models like OpenAI, BERT, or Sentence Transformers.Use when: Standard precision embeddings, most ML modelsMemory: 4 bytes per dimension
16-bit floating point vectors. Provides 50% memory savings with minimal accuracy loss.Use when: Memory constraints, large-scale deploymentsMemory: 2 bytes per dimension
64-bit floating point vectors. High precision for scientific applications.Use when: Extreme precision requirementsMemory: 8 bytes per dimension
8-bit integer vectors. Quantized vectors for maximum memory efficiency.Use when: Extreme memory constraints, post-quantizationMemory: 1 byte per dimension
Sparse Vectors
32-bit sparse vectors. Stores only non-zero values with their indices.Use when: BM25, TF-IDF, or sparse embeddings (e.g., SPLADE)Memory: Depends on sparsity
16-bit sparse vectors. Memory-efficient sparse representation.Use when: Sparse embeddings with memory constraintsMemory: Depends on sparsity
Scalar Data Types
Scalar data types are used for metadata fields and filtering.Integer Types
32-bit signed integer.Range: -2,147,483,648 to 2,147,483,647Use when: Counters, small IDs, enum values
64-bit signed integer.Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807Use when: Large IDs, timestamps, large counters
32-bit unsigned integer.Range: 0 to 4,294,967,295Use when: Always-positive values, IDs
64-bit unsigned integer.Range: 0 to 18,446,744,073,709,551,615Use when: Large always-positive values, hash values
Floating Point Types
32-bit floating point number.Use when: Prices, scores, measurements
64-bit floating point number.Use when: High-precision calculations, scientific data
Text and Boolean
Variable-length UTF-8 string.Use when: Text fields, IDs, categories, descriptions
Boolean value (true or false).Use when: Flags, binary states
Array Data Types
Array types store multiple values in a single field.Array of 32-bit signed integers.Use when: Multiple small integer values per document
Array of 64-bit signed integers.Use when: Lists of large IDs, multiple timestamps
Array of 32-bit unsigned integers.
Array of 64-bit unsigned integers.
Array of 32-bit floating point numbers.Use when: Multiple scores, ratings, measurements
Array of 64-bit floating point numbers.
Array of strings.Use when: Tags, categories, multiple text values
Array of boolean values.
Examples
Vector field types
Scalar field types
Array field types
Complete schema example
Type Selection Guide
For Vector Embeddings
| Model / Use Case | Recommended Type | Notes |
|---|---|---|
| OpenAI embeddings | VECTOR_FP32 | Standard precision |
| Sentence Transformers | VECTOR_FP32 | Standard precision |
| CLIP image embeddings | VECTOR_FP32 or VECTOR_FP16 | FP16 for memory savings |
| BM25 / TF-IDF | SPARSE_VECTOR_FP32 | Sparse representation |
| SPLADE | SPARSE_VECTOR_FP32 | Learned sparse embeddings |
| Memory-constrained | VECTOR_FP16 or VECTOR_INT8 | Trade accuracy for memory |
For Metadata Fields
| Data | Recommended Type | Example | |------|------------------|---------|| | Document IDs |STRING or INT64 | “doc_123” or 123456789 |
| UUIDs | STRING | “550e8400-e29b-41d4-a716-446655440000” |
| Timestamps | INT64 | 1709625600 (Unix timestamp) |
| Categories | STRING | “technology”, “sports” |
| Tags | ARRAY_STRING | [“python”, “machine-learning”] |
| Prices | FLOAT | 19.99 |
| View counts | INT32 or INT64 | 1000 |
| Flags | BOOL | true, false |
| Ratings | FLOAT or ARRAY_FLOAT | 4.5 or [4.0, 5.0, 3.5] |
Sparse Vectors: Sparse vector types (
SPARSE_VECTOR_FP32, SPARSE_VECTOR_FP16) automatically handle sparse representations. You don’t need to specify dimensions upfront.Checking Data Types
See Also
- FieldSchema - Using scalar data types
- VectorSchema - Using vector data types
- CollectionSchema - Combining fields and vectors
- Performance Guide - Optimizing data type selection