lora.py script backed by the MLX trainer, making fine-tuning fast and memory-efficient on Apple Silicon Macs.
Approaches
LoRA (Low-Rank Adaptation)
LoRA freezes the original model weights and injects small trainable rank-decomposition matrices into the attention layers. You train only these adapter parameters — typically less than 1% of total weights — and save the result as a lightweight.safetensors adapter file. This is the default and recommended approach for most use cases.
QLoRA (Quantized LoRA)
QLoRA combines LoRA with a quantized base model (e.g., a 4-bit checkpoint). The base model weights stay quantized and frozen; the LoRA adapters are trained in full precision. This significantly reduces memory usage, making it practical to fine-tune larger models on Mac hardware with limited unified memory. To use QLoRA, point--model-path at a quantized model checkpoint and run the training script normally:
Full fine-tuning
Pass--full-finetune to update all model weights instead of inserting LoRA adapters. This requires substantially more memory and is slower, but gives the model maximum capacity to adapt. You can optionally add --train-vision to also update the vision encoder weights.
MLX trainer backend
The training script uses the MLX trainer, which provides:- Efficient execution on Apple Silicon — MLX automatically maps operations to the M-series GPU/Neural Engine.
- Automatic mixed precision — reduces memory footprint without sacrificing training quality.
- Gradient checkpointing — recomputes activations during the backward pass to trade compute for memory; enable with
--grad-checkpoint. - Gradient accumulation — simulate larger batch sizes by accumulating gradients over multiple steps before updating weights; set with
--gradient-accumulation-steps. - Hugging Face dataset integration — load any dataset directly by its Hub identifier or local path.
Supported models
Fine-tuning is supported for all models except Gemma3n and Qwen3 Omni. This includes:- Qwen2-VL, Qwen2.5-VL, Qwen3-VL
- LLaVA and LLaVA-Next variants
- Deepseek-VL and Deepseek-VL-V2
- Mllama (Llama-3.2-Vision)
- Pixtral
- Idefics3
- SmolVLM
Requirements
- Python 3.7+
mlx-vlmmlxnumpytransformersdatasetsPIL(Pillow)
Next steps
LoRA & QLoRA training
Complete CLI reference, training examples, and Python API for running LoRA and QLoRA jobs.
Dataset preparation
Required dataset format, per-model message structures, and how to build a dataset programmatically.