Overview
When creating a collection, you define a schema that specifies:- Field types (text, int, float, vectors, etc.)
- Required/optional fields
- Indexes for search and retrieval
Fields not defined in the schema can still be upserted, but they won’t have type validation or indexes.
FieldSpec
TheFieldSpec class represents a field specification. It’s created by data type functions like text(), int(), float(), etc.
required()
Mark a field as required. All fields are optional by default.The field specification with the required constraint.
optional()
Explicitly mark a field as optional (this is the default).The field specification marked as optional.
index()
Create an index on a field for efficient searching.The index to create. Can be
semantic_index(), keyword_index(), vector_index(), or multi_vector_index().The field specification with the index attached.
Data Types
text()
Create a field specification for text values.A field specification for text values.
int()
Create a field specification for integer values.A field specification for integer values.
float()
Create a field specification for floating-point values.A field specification for float values.
bool()
Create a field specification for boolean values.A field specification for boolean values.
bytes()
Create a field specification for binary data.A field specification for bytes values.
list()
Create a field specification for list values.The type of values in the list. Must be one of:
"text", "integer", or "float".A field specification for list values.
Vector Types
f8_vector()
Create a field specification for 8-bit float vectors.The dimensionality of the vector.
A field specification for 8-bit float vectors.
f16_vector()
Create a field specification for 16-bit float vectors.The dimensionality of the vector.
A field specification for 16-bit float vectors.
f32_vector()
Create a field specification for 32-bit float vectors.The dimensionality of the vector.
A field specification for 32-bit float vectors.
u8_vector()
Create a field specification for 8-bit unsigned integer vectors.The dimensionality of the vector.
A field specification for 8-bit unsigned integer vectors.
i8_vector()
Create a field specification for 8-bit signed integer vectors.The dimensionality of the vector.
A field specification for 8-bit signed integer vectors.
binary_vector()
Create a field specification for binary vectors.The dimensionality of the binary vector.
A field specification for binary vectors.
f32_sparse_vector()
Create a field specification for 32-bit float sparse vectors.Sparse vectors use u32 dimension indices to support dictionaries of up to 2^32 - 1 terms.
A field specification for 32-bit float sparse vectors.
u8_sparse_vector()
Create a field specification for 8-bit unsigned integer sparse vectors.A field specification for 8-bit unsigned integer sparse vectors.
matrix()
Create a field specification for matrix values (multi-vector fields).The dimensionality of each vector in the matrix.
The data type for matrix values. Must be one of:
"f32", "f16", "f8", "u8", or "i8".A field specification for matrix values.
Indexes
vector_index()
Create a vector index for similarity search on vector fields.The distance metric to use:
"cosine"- Cosine similarity (only dense vectors)"euclidean"- Euclidean distance (only dense vectors)"dot_product"- Dot product (dense and sparse vectors)"hamming"- Hamming distance (only binary vectors)
A vector index configuration.
keyword_index()
Create a keyword index for full-text search on text fields.A keyword index configuration.
semantic_index()
Create a semantic index for automatic embeddings and semantic search.TopK automatically generates embeddings for fields with semantic indexes. You don’t need to manage embeddings manually.
The embedding model to use. Supported models:
"cohere/embed-english-v3"- English-only embeddings"cohere/embed-multilingual-v3"- Multilingual embeddings"cohere/embed-v4"- Latest Cohere model (default)
float32uint8binary
A semantic index configuration.
multi_vector_index()
Create a multi-vector index for matrix fields (e.g., ColBERT-style retrieval).The distance metric to use. Currently only
"maxsim" (Maximum Similarity) is supported.Number of bits for sketching. Used for approximate search optimization.
Quantization strategy for compression:
"1bit"- 1-bit quantization"2bit"- 2-bit quantization"scalar"- Scalar quantization
A multi-vector index configuration.