Available models
FLUX.2-klein
4B parameter model with 4-step generation. Fast inference with INT8 quantization support.
Z-Image-Turbo
6B parameter Single-Stream DiT. 9-step turbo inference with 4-bit quantization for minimal memory.
Qwen-Image
High-quality text-to-image with classifier-free guidance. Available in BF16, 8-bit, and 4-bit variants.
Model comparison
| Feature | FLUX.2-klein | Z-Image-Turbo | Qwen-Image |
|---|---|---|---|
| Parameters | 4B | 6B | Large |
| Steps | 4 | 9 | 20-50 |
| Architecture | Double + Single blocks | S3-DiT | MM-DiT |
| Text encoder | Qwen3-4B | Qwen3-4B (layer 34) | Qwen-VL |
| Quantization | INT8 (~8GB) | 4-bit (~3GB) | 4-bit (~26GB) |
| Speed (512x512) | ~5s | ~3s | ~15-30s |
| Best for | Fast generation | Memory efficiency | Highest quality |
Common features
All models share these capabilities:- Apple Silicon optimized: Native MLX backend for M-series chips
- Quantization support: Reduce memory usage with minimal quality loss
- Rectified flow: Modern denoising schedule for efficient sampling
- Rotary embeddings (RoPE): Advanced position encoding
- AutoencoderKL VAE: High-quality latent decoding
Quick start
Choose your model
Select based on your needs:
- FLUX.2-klein: Fastest generation, good for iteration
- Z-Image-Turbo: Best memory efficiency, still very fast
- Qwen-Image: Highest quality, flexible resolution
Download weights
Models download automatically from HuggingFace on first run. See individual model pages for manual download instructions.
Architecture overview
All three models follow the latent diffusion pipeline:- Text encoding: Prompt → embeddings via Qwen3 or Qwen-VL
- Denoising: Noise → latents via transformer with timestep conditioning
- Decoding: Latents → RGB image via VAE decoder
Performance tips
All performance numbers are from Apple M3 Max with 128GB RAM.
Memory optimization
- Use quantization when memory is limited
- FLUX.2-klein INT8: 13GB → 8GB
- Z-Image 4-bit: 12GB → 3GB
- Qwen-Image 4-bit: 57GB → 26GB
Speed optimization
- Reduce inference steps (quality vs speed trade-off)
- Use smaller resolutions for iteration
- Z-Image-Turbo is fastest for production
- FLUX.2-klein best for rapid prototyping
Next steps
FLUX.2-klein guide
Learn about the 4-step fast generation model
Z-Image guide
Explore the memory-efficient turbo model
Qwen-Image guide
Master high-quality image generation
API reference
Browse the complete API documentation