Introduction

MLX-VLM lets you run state-of-the-art Vision Language Models locally on your Mac. Built on Apple’s MLX framework, it delivers fast, memory-efficient inference on Apple Silicon — no cloud account or GPU server required.

What is MLX-VLM?

MLX-VLM is a Python library that wraps the MLX framework to make it easy to load, run, and fine-tune multi-modal models. It supports models that understand images, audio, and video alongside text, covering everything from quick CLI one-liners to a production-ready OpenAI-compatible REST server. The library is designed to feel familiar: if you’ve used the Hugging Face transformers API or OpenAI’s chat completions format, MLX-VLM follows the same conventions.

Key capabilities

CLI inference — generate text from images, audio, and video directly from your terminal with mlx_vlm.generate
Python API — a simple load / generate interface for embedding inference in your own code
OpenAI-compatible REST server — serve models over HTTP with streaming support via mlx_vlm.server
LoRA and QLoRA fine-tuning — adapt any supported model to your own dataset on-device
Model conversion and quantization — convert Hugging Face checkpoints to 4-bit, 8-bit, and mixed-precision MLX format
Omni model support — process images, audio clips, and video frames in a single prompt with supported models
Multi-image reasoning — pass multiple images in one request for comparison or analysis tasks
Thinking model support — configurable token budgets for reasoning models like Qwen3.5

Supported model families

MLX-VLM includes implementations for over 50 model architectures. The table below lists the currently supported families.

Family	Notes
Qwen2 VL / Qwen2.5 VL	Multi-image and video support
Qwen3 VL / Qwen3.5 / Qwen3 Omni	Thinking budget support
LLaVA / LLaVA-Next / LLaVA-Bunny	Classic open-source VLMs
Gemma 3 / Gemma 3n	Google’s vision-language models with audio support
Phi-3 Vision / Phi-4 Multimodal / Phi-4 Reasoning Vision	Microsoft Phi series
DeepSeek VL v2 / DeepSeek OCR / DeepSeek OCR-2	DeepSeek visual understanding
Mllama	Meta’s multi-modal Llama
Llama 4	Meta Llama 4 series
Mistral 3 / Mistral 4 / Pixtral	Mistral vision models
Idefics2 / Idefics3	HuggingFace IDEFICS series
Paligemma	Google PaLI-Gemma
MiniCPM-o	Omni model with image and audio
Molmo / Molmo 2 / MolmoPoint	AI2 Molmo family
InternVL Chat	InternLM visual models
Florence2	Microsoft Florence
Moondream3	Efficient edge VLM
SmolVLM	Small, fast vision-language model
FastVLM	High-throughput VLM
Kimi VL	Moonshot visual model
Jina VLM	Jina AI visual model
GLM-4V / GLM-4V MoE / GLM-OCR	Zhipu AI vision models
Aya Vision	Cohere Aya vision model
HunyuanVL	Tencent Hunyuan VL
DOTS OCR / DOTS MOCR	Document OCR models
PaddleOCR VL	PaddlePaddle OCR
SAM3 / SAM3.1	Segment Anything Model 3
Ernie 4.5 MoE VL	Baidu Ernie vision model
LFM2 VL	LFM-2 visual model

New architectures are added regularly. Check the mlx_vlm/models directory in the repository for the most up-to-date list.

The mlx-community organization

Pre-quantized, ready-to-use models are published to the mlx-community organization on Hugging Face. When you pass a model identifier like mlx-community/Qwen2-VL-2B-Instruct-4bit, MLX-VLM downloads it automatically on first use and caches it locally. Using mlx-community models means you skip the conversion step entirely — just install the library and start generating.

Model names in mlx-community follow the pattern <ModelName>-<Size>-<Quantization>, for example Qwen2-VL-2B-Instruct-4bit (2B parameters, 4-bit quantized). Smaller quantizations use less memory and run faster at some cost to quality.

Where to go next

Installation

Install MLX-VLM with pip and verify your environment

Quickstart

Run your first VLM inference in minutes

Python API

Integrate VLM inference into your Python code

REST server

Serve models via an OpenAI-compatible HTTP API

Get Started

Inference

Fine-Tuning

Advanced

Models

What is MLX-VLM?

Key capabilities

Supported model families

The mlx-community organization

Where to go next

Installation

Quickstart

Python API

REST server

Build docs developers (and LLMs) love

Get Started

Inference

Fine-Tuning

Advanced

Models

​What is MLX-VLM?

​Key capabilities

​Supported model families

​The mlx-community organization

​Where to go next

Installation

Quickstart

Python API

REST server

Build docs developers (and LLMs) love

What is MLX-VLM?

Key capabilities

Supported model families

The mlx-community organization

Where to go next