Model conversion

MLX-VLM works with models stored in MLX format. To use a model from Hugging Face that hasn’t been converted yet, run the mlx_vlm.convert command. Conversion downloads the model weights, casts them to the target dtype, optionally quantizes them, and writes the result to a local directory.

The mlx-community organization on Hugging Face hosts many pre-converted models. Check there before converting a model yourself.

Basic conversion

Install mlx-vlm

pip install -U mlx-vlm

Convert the model

mlx_vlm.convert --hf-path mistral-community/pixtral-12b --mlx-path ./pixtral-12b-mlx

This downloads the model from Hugging Face and saves the converted weights to ./pixtral-12b-mlx.

Use the converted model

mlx_vlm.generate \
  --model ./pixtral-12b-mlx \
  --prompt "What is in this image?" \
  --image /path/to/image.jpg

CLI reference

mlx_vlm.convert [OPTIONS]

Flag	Type	Default	Description
`--hf-path`, `--model`	string	—	Hugging Face repo ID or local path to the source model
`--mlx-path`	string	`mlx_model`	Directory to write the converted MLX model
`-q`, `--quantize`	flag	false	Quantize the model weights
`--q-bits`	int	4	Bits per weight for quantization
`--q-group-size`	int	64	Group size for quantization
`--q-mode`	string	`affine`	Quantization mode: `affine`, `mxfp4`, `nvfp4`, `mxfp8`
`--quant-predicate`	string	—	Mixed-bit quantization recipe (see Mixed quantization)
`--dtype`	string	from config	Cast weights to `float16`, `bfloat16`, or `float32`
`--upload-repo`	string	—	Hugging Face repo to upload the converted model to
`--revision`	string	—	Branch, tag, or commit to use from the Hugging Face Hub
`-d`, `--dequantize`	flag	false	Dequantize a previously quantized model
`--trust-remote-code`	flag	false	Allow running custom model code from the repository

--quantize and --dequantize are mutually exclusive. Using both at once raises an error.

Common examples

mlx_vlm.convert \
  --hf-path mlx-community/Qwen2-VL-7B-Instruct \
  --mlx-path ./qwen2-vl-7b-4bit \
  --quantize \
  --q-bits 4 \
  --q-group-size 64

Python API

You can also run conversion from Python:

from mlx_vlm import convert

convert(
    hf_path="mlx-community/Qwen2-VL-7B-Instruct",
    mlx_path="./qwen2-vl-7b-4bit",
    quantize=True,
    q_bits=4,
    q_group_size=64,
)

All parameters from the CLI map directly to keyword arguments in convert().

Mixed quantization

Mixed quantization assigns different bit widths to different layers in the model. Layers near the input and output (where precision matters most) receive more bits; middle layers receive fewer. This follows the same strategy as formats like Q4_K_M in llama.cpp. The --quant-predicate flag accepts one of the following recipes:

Recipe	Low bits	High bits
`mixed_2_6`	2	6
`mixed_3_4`	3	4
`mixed_3_5`	3	5
`mixed_3_6`	3	6
`mixed_3_8`	3	8
`mixed_4_6`	4	6
`mixed_4_8`	4	8

The high-bit setting applies to v_proj and down_proj layers in the first and last eighth of the model, as well as lm_head and embed_tokens. All other quantizable layers use the low-bit setting.

By default, the vision encoder is excluded from quantization. The skip_multimodal_module predicate skips any path containing vision_model, vision_tower, vl_connector, audio_model, or audio_tower.

mlx_vlm.convert \
  --hf-path mlx-community/Qwen2-VL-7B-Instruct \
  --mlx-path ./qwen2-vl-7b-mixed-4-8 \
  --quant-predicate mixed_4_8

Uploading to Hugging Face Hub

After conversion, you can push the model directly to your Hugging Face account:

mlx_vlm.convert \
  --hf-path mlx-community/Qwen2-VL-7B-Instruct \
  --mlx-path ./qwen2-vl-7b-4bit \
  --quantize \
  --q-bits 4 \
  --upload-repo your-username/Qwen2-VL-7B-Instruct-4bit-mlx

The --upload-repo value should be the target Hugging Face repo in owner/name format. The CLI will upload all files in --mlx-path to that repository after conversion completes.

Get Started

Inference

Fine-Tuning

Advanced

Models

Model conversion

Basic conversion

CLI reference

Common examples

Python API

Mixed quantization

Uploading to Hugging Face Hub

Build docs developers (and LLMs) love

Get Started

Inference

Fine-Tuning

Advanced

Models

​Basic conversion

​CLI reference

​Common examples

​Python API

​Mixed quantization

​Uploading to Hugging Face Hub

Build docs developers (and LLMs) love

Basic conversion

CLI reference

Common examples

Python API

Mixed quantization

Uploading to Hugging Face Hub