images— a list of PIL images for the sample.messages— a list of message dicts in the chat format your target model expects during inference.
The message format in your dataset must exactly match what the model sees at inference time. Using the wrong format will result in poor training signal and degraded outputs.
Dataset structure
Message formats by model
The content structure inside each message varies by model family. Use the tab for your target model:- Qwen3 / Qwen2 VL
- Mllama (Llama-3.2-Vision)
- Deepseek-VL-V2
- Pixtral / LLaVA
Content is a list of typed objects. Images are referenced inline within the user turn:
Automatic dataset transformation
If your dataset hasquestion and answer columns (plus an image or images column) instead of a messages column, the script can automatically convert it for Qwen and Deepseek models.
For other model families, or to customise the transformation, provide a JSON template with --custom-prompt-format:
{image}, {question}, and {answer}. These are filled from the corresponding dataset columns at runtime.
Creating a dataset programmatically
Common issues
Dataset must have a 'messages' column error
Dataset must have a 'messages' column error
This error means the dataset has neither a
messages column nor both question and answer columns. Either restructure your dataset to include a messages column in the correct format, or add question and answer columns and use --custom-prompt-format to define how they map to messages.Poor training loss or degraded outputs
Poor training loss or degraded outputs
The most common cause is a mismatch between the message format in the dataset and what the model expects. Verify your format against the examples for your specific model family. Pay particular attention to role strings — for example, Deepseek uses
"<|User|>" and "<|Assistant|>" rather than "user" and "assistant".Using multi-image samples
Using multi-image samples
For models that support multiple images per turn (Qwen2/3-VL, Mllama), list all images in the top-level
images column and include one {"type": "image", "image": <img>} entry per image in the user content list, in the order the model should process them.