The model is approximately 8GB. Download time depends on your internet connection.
2
Build the Qwen3 crate
If you haven’t already, build the Qwen3 crate:
cargo build --release -p qwen3-mlx
3
Run text generation
Generate text with a simple prompt:
cargo run --release -p qwen3-mlx --example generate_qwen3 -- \ ./models/Qwen3-4B "Explain quantum computing in simple terms:"
You should see output like:
Loading model from: ./models/Qwen3-4BModel loaded in 2.3sPrompt (8 tokens): Explain quantum computing in simple terms:---Quantum computing is a type of computing that uses quantum mechanics...---Generated 100 tokens in 2.2s (45.5 tok/s)
4
Try interactive chat (optional)
For a more interactive experience, run the chat example:
cargo run --release -p qwen3-mlx --example chat_qwen3 -- \ ./models/Qwen3-4B
This opens an interactive REPL where you can have a conversation with the model.
use qwen3_mlx::{load_model, load_tokenizer, Generate, KVCache};use mlx_rs::ops::indexing::{IndexOp, NewAxis};use mlx_rs::transforms::eval;fn main() -> Result<(), Box<dyn std::error::Error>> { // Load model and tokenizer let mut model = load_model("./models/Qwen3-4B")?; let tokenizer = load_tokenizer("./models/Qwen3-4B")?; // Tokenize prompt let prompt = "Hello, how are you?"; let encoding = tokenizer.encode(prompt, true)?; let prompt_tokens = mlx_rs::Array::from(encoding.get_ids()).index(NewAxis); // Generate tokens let mut cache = Vec::new(); let temperature = 0.7; let generator = Generate::<KVCache>::new( &mut model, &mut cache, temperature, &prompt_tokens, ); let mut tokens = Vec::new(); for token in generator.take(100) { let token = token?; tokens.push(token.clone()); // Decode and print every 10 tokens if tokens.len() % 10 == 0 { eval(&tokens)?; let slice: Vec<u32> = tokens.drain(..) .map(|t| t.item::<u32>()) .collect(); let text = tokenizer.decode(&slice, true)?; print!("{}", text); } } Ok(())}
For better performance on smaller devices, use 4-bit or 8-bit quantized models like Qwen3-4B-8bit-mlx.
FLUX.2-klein requires approximately 13GB of RAM. Ensure you have sufficient memory.
2
Build the FLUX crate
cargo build --release -p flux-klein-mlx
3
Generate an image
cargo run --release -p flux-klein-mlx --example generate_klein -- \ "a beautiful sunset over mountains"
The generated image will be saved as output_klein.ppm.For faster generation, use 4 denoising steps (default):
cargo run --release -p flux-klein-mlx --example generate_klein -- \ --steps 4 "a cat sitting on a couch"
4
Convert to PNG (optional)
Convert the PPM output to PNG:
# Using ImageMagickconvert output_klein.ppm output_klein.png# Or using Pythonpython -c "from PIL import Image; Image.open('output_klein.ppm').save('output_klein.png')"
Use the --quantize flag to reduce memory usage:
cargo run --release -p flux-klein-mlx --example generate_klein -- \ --quantize "a beautiful sunset"
for (i, token) in generator.enumerate() { let token = token?; let token_id = token.item::<u32>(); // Check for EOS if token_id == eos_token_id { break; } // Decode and print immediately let text = tokenizer.decode(&[token_id], true)?; print!("{}", text); std::io::stdout().flush()?;}
let prompts = vec!["Hello", "How are you?", "Goodbye"];let encodings: Vec<_> = prompts.iter() .map(|p| tokenizer.encode(p, true)) .collect::<Result<_, _>>()?;// Process in batch...