Install MLX-VLM
Install the package from PyPI:This pulls in all required dependencies. See the installation guide for extras and source install instructions.
Run text generation from the CLI
Use The model is cached locally after the first download, so subsequent runs start immediately.
mlx_vlm.generate to send a text prompt to a model. The first run downloads the model from Hugging Face automatically.Generate text from an image
Pass an image URL or local file path with The model receives the image and generates a description. You can also pass a local path:
--image:To analyze multiple images in one prompt, pass
--image multiple times: --image image1.jpg --image image2.jpgUse the Python API
For scripting and application integration, use the Python API directly:
load downloads and caches the model on first call. apply_chat_template formats the prompt correctly for the model’s expected input structure. generate returns a GenerationResult object; access the text via .text.Try the chat UI
If you installed the[ui] extra, launch an interactive Gradio chat interface:
http://127.0.0.1:7860) to chat with the model in your browser, including image uploads.
Next steps
CLI reference
All flags for
mlx_vlm.generate, video generation, and thinking budget optionsPython API
Full reference for
load, generate, stream_generate, and batch_generateREST server
Serve models via an OpenAI-compatible HTTP API with streaming
Fine-tuning
Adapt models to your data using LoRA and QLoRA on Apple Silicon