Overview
QuantizeType defines the quantization methods available for compressing vector embeddings. Quantization reduces memory usage and can improve search speed at the cost of some accuracy.
Available Quantization Types
No quantization. Vectors are stored in their original precision.Memory: 100% (baseline)Accuracy: 100%When to use: When accuracy is critical and memory is not a constraint.
16-bit floating point quantization. Reduces precision from 32-bit to 16-bit floats.Memory: ~50% of original (half precision)Accuracy: ~99.5% (minimal loss for most use cases)When to use: General-purpose compression with negligible quality loss.
8-bit integer quantization. Converts floating point values to 8-bit signed integers.Memory: ~25% of original (75% reduction)Accuracy: ~95-98% (noticeable but acceptable loss)When to use: When memory reduction is important and slight accuracy loss is acceptable.
4-bit integer quantization. Converts floating point values to 4-bit integers.Memory: ~12.5% of original (87.5% reduction)Accuracy: ~90-95% (significant loss, use with caution)When to use: Extreme memory constraints, large-scale deployments, when recall drop is acceptable.
Quantization Properties
AllQuantizeType enum members have these properties:
The name of the quantization type as a string.
The internal integer value of the quantization type.
Usage Examples
Basic Quantization
Comparing Quantization Levels
Multi-Field Schema with Different Quantization
Choosing the Right Quantization
Decision Matrix
FP16
Best balance for most use cases✅ 50% memory savings
✅ ~99.5% accuracy retained
✅ Minimal quality loss
✅ Good for productionUse for: Text embeddings, semantic search, general applications
INT8
Good compression with acceptable quality✅ 75% memory savings
⚠️ ~95-98% accuracy
⚠️ Noticeable but acceptable loss
✅ Faster searchUse for: Large-scale systems, cost-sensitive deployments, when quality drop is acceptable
INT4
Extreme compression for specific needs✅ 87.5% memory savings
❌ ~90-95% accuracy
❌ Significant quality loss
✅ Very fast searchUse for: Massive scale (billions of vectors), memory-critical environments, when recall drop is acceptable
UNDEFINED (No Quantization)
Maximum quality, baseline✅ 100% accuracy
❌ 100% memory usage
❌ Slower searchUse for: Critical accuracy requirements, small datasets, benchmarking
Quantization Trade-offs
- Memory
- Accuracy
- Speed
| Quantization | Memory per 768-dim vector | Savings |
|---|---|---|
| UNDEFINED (FP32) | 3,072 bytes | 0% |
| FP16 | 1,536 bytes | 50% |
| INT8 | 768 bytes | 75% |
| INT4 | 384 bytes | 87.5% |
- FP32: ~3 GB
- FP16: ~1.5 GB
- INT8: ~768 MB
- INT4: ~384 MB
Best Practices
Start with FP16
Begin with
FP16 quantization for most applications. It provides excellent memory savings with minimal quality loss.Consider Your Scale
Choose quantization based on dataset size:
- < 1M vectors: FP16 or UNDEFINED
- 1-10M vectors: FP16 or INT8
- 10-100M vectors: INT8
- > 100M vectors: INT8 or INT4
Quantization and Vector Types
Advanced Considerations
Re-ranking with Quantized Vectors
For critical applications, use quantized vectors for initial retrieval, then re-rank with full precision:Calibration
Some quantization methods benefit from calibration on representative data:See Also
- DataType - Vector data types
- Field Definition - Schema field configuration
- IndexType - Index types for vector search
- Performance Tuning - Optimization strategies