Installation

Requirements

Python 3.10 or later
macOS with Apple Silicon (M1 / M2 / M3 / M4) for native MLX acceleration
pip (included with Python)

MLX-VLM runs on Intel Macs and Linux via the mlx-cpu backend, and on NVIDIA GPUs via the mlx-cuda backend. Performance and feature coverage are best on Apple Silicon.

Install

pip (recommended)
From source

Install the package

Run the following command to install MLX-VLM from PyPI:

pip install -U mlx-vlm

This installs the core library along with all required dependencies: mlx, transformers, mlx-lm, Pillow, fastapi, opencv-python, and more.

Verify the install

Check that the CLI entry points are available:

mlx_vlm.generate --help

You should see the list of available flags printed to your terminal.

Clone the repository

git clone https://github.com/Blaizzy/mlx-vlm.git
cd mlx-vlm

Install in editable mode

pip install -e .

Editable mode means changes to the source files take effect immediately without reinstalling.

Verify the install

mlx_vlm.generate --help

Optional extras

Install extras alongside the base package using the bracket syntax:

pip install -U "mlx-vlm[ui]"        # Gradio chat UI
pip install -U "mlx-vlm[cuda]"      # NVIDIA GPU support via MLX CUDA
pip install -U "mlx-vlm[cpu]"       # CPU-only backend (no Metal/CUDA)

Extra	Package added	When to use
`ui`	`gradio>=5.19.0`	You want the built-in `mlx_vlm.chat_ui` web interface
`cuda`	`mlx-cuda`	You are running on a machine with an NVIDIA GPU
`cpu`	`mlx-cpu`	You need a pure-CPU fallback (Intel Mac, Linux without GPU)

The cuda and cpu extras replace the default Metal-accelerated mlx backend. Do not install both cuda and cpu in the same environment.

MLX CUDA support

MLX-VLM can run on NVIDIA GPUs through the experimental MLX CUDA backend.

pip install -U "mlx-vlm[cuda]"

Models quantized with mxfp8 or nvfp4 modes require activation quantization when running on CUDA. Pass the -qa flag to mlx_vlm.generate, or set quantize_activations=True in the Python API:

mlx_vlm.generate \
  --model /path/to/mxfp8-model \
  --prompt "Describe this image." \
  --image /path/to/image.jpg \
  -qa

On Apple Silicon the -qa flag is not required — Metal-backed MLX handles mxfp8 and nvfp4 models natively.

Installed CLI commands

After installation the following commands are available on your PATH:

Command	Description
`mlx_vlm.generate`	Run inference from the command line
`mlx_vlm.chat_ui`	Launch the Gradio chat interface (`[ui]` extra required)
`mlx_vlm.server`	Start the FastAPI REST server
`mlx_vlm.convert`	Convert and quantize Hugging Face models to MLX format

Get Started

Inference

Fine-Tuning

Advanced

Models

Requirements

Install

Optional extras

MLX CUDA support

Installed CLI commands

Build docs developers (and LLMs) love

Get Started

Inference

Fine-Tuning

Advanced

Models

​Requirements

​Install

​Optional extras

​MLX CUDA support

​Installed CLI commands

Build docs developers (and LLMs) love

Requirements

Install

Optional extras

MLX CUDA support

Installed CLI commands