LoRA & QLoRA Training

LoRA (Low-Rank Adaptation) injects small trainable matrices into the model’s attention layers while keeping the original weights frozen. Because only the adapter parameters are updated, training requires far less memory and time than full fine-tuning, and the resulting adapter file is a fraction of the full model size. QLoRA extends this by loading the base model in a quantized format (e.g., 4-bit), keeping it frozen in low precision while training the LoRA adapters in full precision — trading a small amount of accuracy for a large reduction in memory usage.

Basic usage

Prepare your dataset

Create or identify a Hugging Face dataset with images and messages columns in the format your target model expects. See Dataset preparation for details.

Run the training script

Call lora.py with at minimum --model-path and --dataset:

python lora.py \
    --model-path mlx-community/Qwen3-VL-2B-Instruct-bf16 \
    --dataset your-huggingface-dataset-id

Use your adapter

After training, load the saved adapter alongside the base model for inference:

mlx_vlm.generate \
    --model mlx-community/Qwen3-VL-2B-Instruct-bf16 \
    --adapter-path ./adapters.safetensors \
    --prompt "Describe this image" \
    --image /path/to/image.jpg

CLI reference

Model arguments

Argument	Default	Description
`--model-path`	`mlx-community/Qwen2-VL-2B-Instruct-bf16`	Path or Hub ID of the base model to fine-tune.
`--full-finetune`	—	Update all model weights instead of using LoRA adapters.
`--train-vision`	—	Unfreeze and train the vision encoder alongside the language model.

Dataset arguments

Argument	Default	Description
`--dataset`	(required)	Local path or Hugging Face dataset identifier.
`--split`	`train`	Dataset split to use.
`--dataset-config`	—	Dataset configuration name (for datasets with multiple configs).
`--image-resize-shape`	—	Resize all images to a fixed shape, e.g. `768 768`.
`--custom-prompt-format`	—	JSON template for datasets with `question`/`answer` columns instead of `messages`.

Training arguments

Argument	Default	Description
`--learning-rate`	`2e-5`	Optimizer learning rate.
`--batch-size`	`4`	Number of samples per training step.
`--iters`	`1000`	Total training iterations. Ignored if `--epochs` is set.
`--epochs`	—	Number of full passes over the dataset. Overrides `--iters`.
`--steps-per-report`	`10`	Log loss and throughput every N steps.
`--steps-per-eval`	`200`	Run validation every N steps.
`--steps-per-save`	`100`	Save a checkpoint every N steps.
`--val-batches`	`25`	Number of batches used for each validation run.
`--max-seq-length`	`2048`	Maximum token sequence length; longer sequences are truncated.
`--grad-checkpoint`	—	Enable gradient checkpointing to reduce peak memory (slightly slower).
`--grad-clip`	—	Clip gradients to this maximum norm.
`--train-on-completions`	—	Compute loss only on assistant responses, not on the prompt.
`--gradient-accumulation-steps`	`1`	Accumulate gradients over N batches before updating weights.
`--assistant-id`	`77091`	Token ID used to identify the start of assistant turns (for completion masking).

LoRA arguments

Argument	Default	Description
`--lora-rank`	`8`	Rank of the LoRA decomposition matrices. Higher values increase adapter expressiveness.
`--lora-alpha`	`16`	Scaling factor applied to the LoRA updates. Effective learning rate scales with `lora-alpha / lora-rank`.
`--lora-dropout`	`0.0`	Dropout probability applied to LoRA layers during training.

Output arguments

Argument	Default	Description
`--output-path`	`adapters.safetensors`	File path where the trained adapter is saved.
`--adapter-path`	—	Path to an existing adapter to resume training from.

Training examples

python lora.py \
    --model-path mlx-community/Qwen3-VL-2B-Instruct-bf16 \
    --dataset your-huggingface-dataset-id \
    --batch-size 2 \
    --epochs 2 \
    --learning-rate 2e-5 \
    --output-path ./qwen3-lora-adapter.safetensors

Python API

You can drive training programmatically by constructing an argparse.Namespace and calling main:

Basic LoRA
QLoRA
Full fine-tuning
Resume training

import argparse
from mlx_vlm.lora import main

args = argparse.Namespace(
    model_path="mlx-community/Qwen3-VL-2B-Instruct-bf16",
    dataset="your-huggingface-dataset-id",
    split="train",
    dataset_config=None,
    batch_size=2,
    epochs=2,
    learning_rate=2e-5,
    iters=1000,
    steps_per_report=10,
    steps_per_eval=200,
    steps_per_save=100,
    val_batches=25,
    max_seq_length=2048,
    lora_rank=8,
    lora_alpha=16,
    lora_dropout=0.0,
    output_path="./qwen3-lora-adapter.safetensors",
    adapter_path=None,
    full_finetune=False,
    train_vision=False,
    grad_checkpoint=False,
    grad_clip=None,
    train_on_completions=False,
    gradient_accumulation_steps=1,
    assistant_id=77091,
    image_resize_shape=None,
    custom_prompt_format=None,
)

main(args)

import argparse
from mlx_vlm.lora import main

# Point model_path at a quantized checkpoint for QLoRA
args = argparse.Namespace(
    model_path="mlx-community/Qwen3-VL-2B-Instruct-4bit",
    dataset="your-dataset-id",
    split="train",
    dataset_config=None,
    batch_size=4,
    epochs=2,
    learning_rate=2e-4,
    iters=1000,
    steps_per_report=10,
    steps_per_eval=200,
    steps_per_save=100,
    val_batches=25,
    max_seq_length=2048,
    lora_rank=16,
    lora_alpha=32,
    lora_dropout=0.05,
    output_path="./qwen3-qlora-adapter.safetensors",
    adapter_path=None,
    full_finetune=False,
    train_vision=False,
    grad_checkpoint=False,
    grad_clip=None,
    train_on_completions=False,
    gradient_accumulation_steps=1,
    assistant_id=77091,
    image_resize_shape=None,
    custom_prompt_format=None,
)

main(args)

import argparse
from mlx_vlm.lora import main

args = argparse.Namespace(
    model_path="mlx-community/Qwen3-VL-2B-Instruct-bf16",
    dataset="your-dataset-id",
    split="train",
    dataset_config=None,
    batch_size=1,
    epochs=1,
    learning_rate=5e-6,
    iters=1000,
    steps_per_report=10,
    steps_per_eval=200,
    steps_per_save=100,
    val_batches=25,
    max_seq_length=2048,
    lora_rank=8,
    lora_alpha=16,
    lora_dropout=0.0,
    output_path="./qwen3-full-finetune.safetensors",
    adapter_path=None,
    full_finetune=True,   # Full weight fine-tuning
    train_vision=True,    # Also train the vision encoder
    grad_checkpoint=True,
    grad_clip=None,
    train_on_completions=False,
    gradient_accumulation_steps=1,
    assistant_id=77091,
    image_resize_shape=None,
    custom_prompt_format=None,
)

main(args)

import argparse
from mlx_vlm.lora import main

args = argparse.Namespace(
    model_path="mlx-community/Qwen3-VL-2B-Instruct-bf16",
    dataset="your-dataset-id",
    split="train",
    dataset_config=None,
    batch_size=2,
    epochs=2,
    learning_rate=1e-5,
    iters=1000,
    steps_per_report=10,
    steps_per_eval=200,
    steps_per_save=100,
    val_batches=25,
    max_seq_length=2048,
    lora_rank=8,
    lora_alpha=16,
    lora_dropout=0.0,
    output_path="./qwen3-lora-adapter-v2.safetensors",
    adapter_path="./qwen3-lora-adapter.safetensors",  # Resume from here
    full_finetune=False,
    train_vision=False,
    grad_checkpoint=False,
    grad_clip=None,
    train_on_completions=False,
    gradient_accumulation_steps=1,
    assistant_id=77091,
    image_resize_shape=None,
    custom_prompt_format=None,
)

main(args)

Training output

The script logs progress at the interval set by --steps-per-report. Each report includes:

Current step and total steps
Loss at the current step
Running average loss
Throughput in tokens/sec
Estimated time remaining

After training completes, the adapter is saved to --output-path.

Training tips

Memory optimization

Enable --grad-checkpoint to reduce peak memory at the cost of slightly longer training time.
Reduce --batch-size to 1 or 2 if you run out of memory.
Use --gradient-accumulation-steps to maintain an equivalent effective batch size without holding more activations in memory (e.g., --batch-size 1 --gradient-accumulation-steps 8 approximates a batch size of 8).
Use QLoRA with a 4-bit model checkpoint for the lowest memory footprint.

Convergence and quality

Start with learning rates in the range 1e-5 to 2e-5 for LoRA. QLoRA often benefits from slightly higher rates (2e-4) because the base model is already compressed.
Increase --lora-rank to 16 or 32 for more expressive adapters on complex tasks. Higher rank increases parameter count and memory use.
Use --train-on-completions to mask the prompt tokens from the loss — this focuses training on the model’s output quality and often improves convergence on instruction-following tasks.
Add --train-vision only when the task requires the model to understand visual features it wasn’t trained on (e.g., domain-specific imagery like medical scans or satellite data).
Monitor --steps-per-eval validation loss to detect overfitting early.

Hardware-specific guidance

On Apple Silicon, MLX automatically utilizes the unified memory architecture and maps operations to the GPU and Neural Engine. No additional configuration is needed.
For models larger than 11B parameters, always enable --grad-checkpoint.
Use --image-resize-shape to cap image resolution and reduce the sequence length fed into the vision encoder, which directly reduces memory and speeds up training.
Larger batch sizes improve GPU utilization up to a point; if you have memory headroom, increasing --batch-size is often more effective than increasing --gradient-accumulation-steps.

Get Started

Inference

Fine-Tuning

Advanced

Models

LoRA & QLoRA Training

Basic usage

CLI reference

Model arguments

Dataset arguments

Training arguments

LoRA arguments

Output arguments

Training examples

Python API

Training output

Training tips

Build docs developers (and LLMs) love

Get Started

Inference

Fine-Tuning

Advanced

Models

​Basic usage

​CLI reference

​Model arguments

​Dataset arguments

​Training arguments

​LoRA arguments

​Output arguments

​Training examples

​Python API

​Training output

​Training tips

Build docs developers (and LLMs) love

Basic usage

CLI reference

Model arguments

Dataset arguments

Training arguments

LoRA arguments

Output arguments

Training examples

Python API

Training output

Training tips