MLX-VLM supports sending multiple images in one request. This lets you ask questions that span several images — for example, comparing two photos, describing a sequence of frames, or referencing multiple charts in a single prompt.
Supported models
Not all architectures support multi-image inputs. The following model families accept more than one image per request:
| Model family | Notes |
|---|
| Qwen2-VL | Full multi-image support |
| Qwen2.5-VL | Full multi-image support |
| LLaVA / LLaVA-Next | Multi-image support |
| Idefics2 / Idefics3 | Multi-image support |
| Phi-4 Multimodal | Multi-image support |
Models that do not natively support multi-image inputs will process only the first image in the list, or may produce degraded results. Check the model card on Hugging Face to confirm support.
Python example
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = model.config
images = ["path/to/image1.jpg", "path/to/image2.jpg"]
prompt = "Compare these two images."
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(images)
)
output = generate(model, processor, formatted_prompt, images, verbose=False)
print(output.text)
Pass the full list of images to both apply_chat_template (via num_images) and to generate. The num_images argument tells the template how many image tokens to insert into the prompt; the list passed to generate provides the actual pixel data.
CLI example
Pass multiple paths or URLs to --image as space-separated values:
mlx_vlm.generate \
--model mlx-community/Qwen2-VL-2B-Instruct-4bit \
--max-tokens 100 \
--prompt "Compare these images" \
--image path/to/image1.jpg path/to/image2.jpg
You can mix local paths and URLs:
mlx_vlm.generate \
--model mlx-community/Qwen2-VL-2B-Instruct-4bit \
--max-tokens 200 \
--prompt "What do these two scenes have in common?" \
--image ./local_photo.jpg http://images.cocodataset.org/val2017/000000039769.jpg
Using PIL images
You can pass PIL.Image.Image objects directly in the Python API instead of file paths:
from PIL import Image
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = model.config
images = [
Image.open("path/to/image1.jpg"),
Image.open("path/to/image2.jpg"),
]
prompt = "What differences do you notice between these two images?"
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(images)
)
output = generate(model, processor, formatted_prompt, images, verbose=False)
print(output.text)
Using PIL.Image.Image objects is convenient when images are already loaded in memory (for example, from a preprocessing pipeline) and avoids writing temporary files to disk.