Skip to main content
The image generation suite provides three state-of-the-art diffusion models optimized for Apple Silicon, each offering different trade-offs between speed, quality, and memory usage.

Available models

FLUX.2-klein

4B parameter model with 4-step generation. Fast inference with INT8 quantization support.

Z-Image-Turbo

6B parameter Single-Stream DiT. 9-step turbo inference with 4-bit quantization for minimal memory.

Qwen-Image

High-quality text-to-image with classifier-free guidance. Available in BF16, 8-bit, and 4-bit variants.

Model comparison

FeatureFLUX.2-kleinZ-Image-TurboQwen-Image
Parameters4B6BLarge
Steps4920-50
ArchitectureDouble + Single blocksS3-DiTMM-DiT
Text encoderQwen3-4BQwen3-4B (layer 34)Qwen-VL
QuantizationINT8 (~8GB)4-bit (~3GB)4-bit (~26GB)
Speed (512x512)~5s~3s~15-30s
Best forFast generationMemory efficiencyHighest quality

Common features

All models share these capabilities:
  • Apple Silicon optimized: Native MLX backend for M-series chips
  • Quantization support: Reduce memory usage with minimal quality loss
  • Rectified flow: Modern denoising schedule for efficient sampling
  • Rotary embeddings (RoPE): Advanced position encoding
  • AutoencoderKL VAE: High-quality latent decoding

Quick start

1

Choose your model

Select based on your needs:
  • FLUX.2-klein: Fastest generation, good for iteration
  • Z-Image-Turbo: Best memory efficiency, still very fast
  • Qwen-Image: Highest quality, flexible resolution
2

Download weights

Models download automatically from HuggingFace on first run. See individual model pages for manual download instructions.
3

Generate your first image

# FLUX.2-klein
cargo run --example generate_klein --release -- "a cat on a windowsill"

# Z-Image-Turbo
cargo run --example generate_zimage_quantized --release -- "a cat on a windowsill"

# Qwen-Image
cargo run --example generate_qwen_image --release -- -p "a cat on a windowsill"

Architecture overview

All three models follow the latent diffusion pipeline:
┌──────────────────────────────────────────────────────────┐
│                 Latent Diffusion Pipeline                 │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  ┌──────────────┐    ┌─────────────┐    ┌────────────┐  │
│  │ Text Encoder │───▶│ Transformer │───▶│    VAE     │  │
│  │ (Qwen/Qwen3) │    │   (DiT)     │    │  Decoder   │  │
│  └──────────────┘    └─────────────┘    └────────────┘  │
│         │                   │                   │        │
│     [B,L,D]          [B,N,hidden]          [B,H,W,3]     │
│                                                           │
└──────────────────────────────────────────────────────────┘
  1. Text encoding: Prompt → embeddings via Qwen3 or Qwen-VL
  2. Denoising: Noise → latents via transformer with timestep conditioning
  3. Decoding: Latents → RGB image via VAE decoder

Performance tips

All performance numbers are from Apple M3 Max with 128GB RAM.

Memory optimization

  • Use quantization when memory is limited
  • FLUX.2-klein INT8: 13GB → 8GB
  • Z-Image 4-bit: 12GB → 3GB
  • Qwen-Image 4-bit: 57GB → 26GB

Speed optimization

  • Reduce inference steps (quality vs speed trade-off)
  • Use smaller resolutions for iteration
  • Z-Image-Turbo is fastest for production
  • FLUX.2-klein best for rapid prototyping

Next steps

FLUX.2-klein guide

Learn about the 4-step fast generation model

Z-Image guide

Explore the memory-efficient turbo model

Qwen-Image guide

Master high-quality image generation

API reference

Browse the complete API documentation

Build docs developers (and LLMs) love