Data Utilities - TRL - Transformers Reinforcement Learning

The trl.data_utils module (also importable from the top-level trl package) provides helpers for dataset preprocessing: applying chat templates, detecting conversational formats, extracting prompts from preference data, packing sequences, and converting legacy from/value formats to ChatML.

from trl import (
    apply_chat_template,
    maybe_apply_chat_template,
    extract_prompt,
    maybe_extract_prompt,
    is_conversational,
    is_conversational_from_value,
    pack_dataset,
    unpair_preference_dataset,
    maybe_unpair_preference_dataset,
    maybe_convert_to_chatml,
    prepare_multimodal_messages,
    prepare_multimodal_messages_vllm,
)

is_conversational

Checks whether a dataset example uses the conversational message format (role/content dicts).

Signature

def is_conversational(example: dict[str, Any]) -> bool

Parameters

example

dict[str, Any]

A single dataset entry. Supported keys inspected: "prompt", "chosen", "rejected", "completion", "messages".

Returns

bool — True if the first value found under a supported key is a list of dicts containing a "role" key.

Example

from trl import is_conversational

print(is_conversational({"prompt": [{"role": "user", "content": "What color is the sky?"}]}))
# True

print(is_conversational({"prompt": "The sky is"}))
# False

is_conversational_from_value

Checks whether a dataset example uses the legacy from/value conversational format (e.g., ShareGPT-style). This format is not recommended; prefer the ChatML role/content format.

Signature

def is_conversational_from_value(example: dict[str, Any]) -> bool

Parameters

example

dict[str, Any]

A single dataset entry. Inspects the "conversations" key.

Returns

bool — True if example["conversations"] is a list of dicts with both "from" and "value" keys.

Example

from trl import is_conversational_from_value

print(is_conversational_from_value({"conversations": [{"from": "user", "value": "Hello"}]}))
# True

print(is_conversational_from_value({"conversations": [{"role": "user", "content": "Hello"}]}))
# False

apply_chat_template

Applies the tokenizer’s chat template to a conversational example. Handles all supported dataset types:

Dataset type	Required keys
Language modeling	`"messages"`
Prompt-only	`"prompt"`
Prompt-completion	`"prompt"`, `"completion"`
Preference	`"prompt"`, `"chosen"`, `"rejected"`
Preference (implicit prompt)	`"chosen"`, `"rejected"`
Unpaired preference	`"prompt"`, `"completion"`, `"label"`

For prompt-only data, if the last role is "user" or "tool", a generation prompt is appended; if it is "assistant", the message is continued.

Signature

def apply_chat_template(
    example: dict[str, list[dict[str, str]]],
    tokenizer: PreTrainedTokenizerBase | ProcessorMixin,
    tools: list[dict | Callable] | None = None,
    **template_kwargs,
) -> dict[str, str]

Parameters

example

dict[str, list[dict[str, str]]]

Single dataset entry with conversational messages. May also contain a "chat_template_kwargs" key with additional kwargs forwarded to the template renderer.

tokenizer

PreTrainedTokenizerBase | ProcessorMixin

Tokenizer or processor whose apply_chat_template method will be called.

tools

list[dict | Callable]

List of tool definitions forwarded to the chat template. Has no effect if the template does not support function calling.

**template_kwargs

Additional keyword arguments passed directly to tokenizer.apply_chat_template.

Returns

dict[str, str] — Dictionary with the same keys as the input (except "messages" becomes "text"), all values converted to formatted strings.

Example

from transformers import AutoTokenizer
from trl import apply_chat_template

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")
example = {
    "prompt": [{"role": "user", "content": "What color is the sky?"}],
    "completion": [{"role": "assistant", "content": "It is blue."}],
}
print(apply_chat_template(example, tokenizer))
# {'prompt': '<|user|>\nWhat color is the sky?<|end|>\n<|assistant|>\n',
#  'completion': 'It is blue.<|end|>\n'}

maybe_apply_chat_template

Applies apply_chat_template only when the example is in a conversational format (detected via is_conversational). Non-conversational examples are returned unchanged.

Signature

def maybe_apply_chat_template(
    example: dict[str, list[dict[str, str]]],
    tokenizer: PreTrainedTokenizerBase,
    tools: list[dict | Callable] | None = None,
    **template_kwargs: Any,
) -> dict[str, str]

Parameters

example

dict[str, list[dict[str, str]]]

Single dataset entry, conversational or plain-text.

tokenizer

PreTrainedTokenizerBase

Tokenizer used to apply the chat template.

tools

list[dict | Callable]

Tool definitions forwarded to the template.

**template_kwargs

Additional kwargs passed to tokenizer.apply_chat_template.

Returns

dict[str, str] — The formatted example if conversational; the original example otherwise.

extract_prompt

Extracts the shared prompt from a paired preference example where the prompt is implicit (i.e., both "chosen" and "rejected" include it as a prefix). The function finds the longest common prefix of conversation turns between "chosen" and "rejected", removes it from both, and returns it as "prompt".

Signature

def extract_prompt(example: dict[str, Sequence]) -> dict[str, Sequence]

Parameters

example

dict[str, Sequence]

Dictionary containing "chosen" and "rejected" keys, each holding a list of conversation turns (either message dicts or plain strings).

Returns

dict with keys:

"prompt": the extracted common prefix.
"chosen": remainder of the chosen completion.
"rejected": remainder of the rejected completion.

Example

from trl import extract_prompt

example = {
    "chosen": [
        {"role": "user", "content": "What color is the sky?"},
        {"role": "assistant", "content": "It is blue."},
    ],
    "rejected": [
        {"role": "user", "content": "What color is the sky?"},
        {"role": "assistant", "content": "It is green."},
    ],
}
print(extract_prompt(example))
# {'prompt': [{'role': 'user', 'content': 'What color is the sky?'}],
#  'chosen': [{'role': 'assistant', 'content': 'It is blue.'}],
#  'rejected': [{'role': 'assistant', 'content': 'It is green.'}]}

maybe_extract_prompt

Extracts the shared prompt from a preference example only when necessary. If a "prompt" key already exists and is in the same format (both conversational or both plain-text), the example is returned as-is.

Signature

def maybe_extract_prompt(example: dict[str, list]) -> dict[str, list]

Parameters

example

dict[str, list]

Single dataset entry. Must contain "chosen" and "rejected" to be treated as a preference example.

Returns

The original example (if prompt extraction is not needed) or the result of extract_prompt.

pack_dataset

Packs short sequences in a dataset into longer chunks of exactly seq_length tokens, reducing padding and increasing training efficiency. Three strategies are available:

Strategy	Sequence boundaries	Token loss	Best for
`"bfd"`	Preserved	Truncates overflow	SFT, chat datasets
`"bfd_split"`	Preserved	None (splits overflow)	Pre-training, long documents
`"wrapped"`	Ignored (cuts mid-sequence)	None	Pre-training (fastest)

The "bfd" and "bfd_split" strategies add a "seq_lengths" column that records the original sequence boundaries within each packed example — useful for constructing position_ids.

Signature

def pack_dataset(
    dataset: Dataset | DatasetDict,
    seq_length: int,
    strategy: str = "bfd",
    map_kwargs: dict[str, Any] | None = None,
) -> Dataset | DatasetDict

Parameters

dataset

Dataset | DatasetDict

Dataset to pack. All columns must be lists of token-id lists.

seq_length

int

Target packed sequence length in tokens.

strategy

str

default:"'bfd'"

Packing strategy. One of "bfd", "bfd_split", or "wrapped".

map_kwargs

dict

Extra keyword arguments forwarded to dataset.map during packing (e.g., num_proc, batch_size).

Returns

Dataset | DatasetDict — The packed dataset. The number of rows typically decreases as sequences are merged.

Example

from datasets import Dataset
from trl import pack_dataset

examples = {
    "input_ids": [[1, 2, 3, 4, 5], [6, 7], [8, 9, 10], [11]],
    "attention_mask": [[1, 1, 1, 0, 0], [1, 0], [1, 1, 0], [1]],
}
dataset = Dataset.from_dict(examples)
packed = pack_dataset(dataset, seq_length=4, strategy="bfd")
print(packed[:])
# {'input_ids': [[1, 2, 3, 4], [8, 9, 10, 11], [6, 7]],
#  'attention_mask': [[1, 1, 1, 0], [1, 1, 0, 1], [1, 0]],
#  'seq_lengths': [[4], [3, 1], [2]]}

unpair_preference_dataset

Converts a paired preference dataset ("chosen" / "rejected" columns) into an unpaired format where each pair becomes two separate rows, each with a "completion" and a boolean "label" column (True for chosen, False for rejected).

Signature

def unpair_preference_dataset(
    dataset: Dataset | DatasetDict,
    num_proc: int | None = None,
    desc: str | None = None,
) -> Dataset | DatasetDict

Parameters

dataset

Dataset | DatasetDict

Paired preference dataset with "chosen", "rejected", and optionally "prompt" columns.

num_proc

int

Number of processes for parallel dataset processing.

desc

str

Description shown alongside the progress bar during mapping.

Returns

Dataset | DatasetDict — Unpaired dataset with columns "prompt" (if present), "completion", and "label".

Example

from datasets import Dataset
from trl import unpair_preference_dataset

dataset = Dataset.from_dict({
    "prompt": ["The sky is", "The sun is"],
    "chosen": [" blue.", "in the sky."],
    "rejected": [" green.", " in the sea."],
})
dataset = unpair_preference_dataset(dataset)
print(dataset[0])
# {'prompt': 'The sky is', 'completion': ' blue.', 'label': True}

maybe_unpair_preference_dataset

Calls unpair_preference_dataset only when the dataset contains both "chosen" and "rejected" columns. Otherwise returns the dataset unchanged.

Signature

def maybe_unpair_preference_dataset(
    dataset: Dataset | DatasetDict,
    num_proc: int | None = None,
    desc: str | None = None,
) -> Dataset | DatasetDict

Parameters

dataset

Dataset | DatasetDict

Preference dataset to conditionally unpair.

num_proc

int

Number of processes for parallel processing.

desc

str

Progress bar description.

Returns

Dataset | DatasetDict — Unpaired dataset if the input was paired, otherwise the original dataset.

maybe_convert_to_chatml

Converts a dataset example from the legacy from/value format (e.g., ShareGPT) to ChatML role/content format. If the example is already in ChatML format the function is a no-op. The transformation performed:

Renames "from" → "role" in each message dict.
Renames "value" → "content" in each message dict.
Renames the top-level key "conversations" → "messages".

Signature

def maybe_convert_to_chatml(example: dict[str, list]) -> dict[str, list]

Parameters

example

dict[str, list]

Single dataset entry. Inspects keys "prompt", "completion", "chosen", "rejected", "messages", and "conversations".

Returns

dict[str, list] — Example in ChatML format.

Example

from trl import maybe_convert_to_chatml

example = {
    "conversations": [
        {"from": "user", "value": "What color is the sky?"},
        {"from": "assistant", "value": "It is blue."},
    ]
}
print(maybe_convert_to_chatml(example))
# {'messages': [{'role': 'user', 'content': 'What color is the sky?'},
#               {'role': 'assistant', 'content': 'It is blue.'}]}

prepare_multimodal_messages

Converts a list of messages into a structured multimodal format and injects provided image objects into "image" placeholder blocks. When the input messages have plain string "content" fields, the function:

Wraps each string in {"type": "text", "text": ...}.
Inserts {"type": "image"} placeholder blocks before the text in the first user message.
Fills each {"type": "image"} placeholder with the corresponding image object from the images list.

If the "content" fields are already structured lists, only step 3 is applied.

The number of {"type": "image"} placeholders across all messages must equal len(images), otherwise a ValueError is raised.

Signature

def prepare_multimodal_messages(
    messages: list[dict[str, Any]],
    images: list,
) -> list[dict[str, Any]]

Parameters

messages

list[dict[str, Any]]

List of message dicts with "role" and "content" (or "tool_calls" for assistant turns). Roles "system", "user", "assistant", and "tool" are supported.

images

list

Image objects (e.g., PIL images) to inject. May be empty if no images are referenced in the messages.

Returns

list[dict[str, Any]] — Deep copy of messages with all "content" fields converted to structured block lists and image placeholders populated.

Example

from PIL import Image
from trl import prepare_multimodal_messages

img = Image.open("photo.jpg")
messages = [
    {"role": "user", "content": "What's in this image?"},
    {"role": "assistant", "content": "It looks like a cat."},
]
result = prepare_multimodal_messages(messages, images=[img])
# result[0]["content"] == [
#     {"type": "image", "image": <PIL.Image>},
#     {"type": "text", "text": "What's in this image?"},
# ]

prepare_multimodal_messages_vllm

Converts structured multimodal messages from the standard TRL format into a format compatible with vLLM. Specifically, replaces "type": "image" content blocks (which use "image" key) with "type": "image_pil" blocks (which use "image_pil" key) as required by vLLM’s input format. Use this function when passing pre-processed multimodal messages to a vLLM generation server.

Signature

def prepare_multimodal_messages_vllm(
    messages: list[dict[str, Any]],
) -> list[dict[str, Any]]

Parameters

messages

list[dict[str, Any]]

Messages with "role" and "content" keys. Content is expected to be a list of structured blocks (as produced by prepare_multimodal_messages).

Returns

list[dict[str, Any]] — Deep copy of messages with all {"type": "image", "image": ...} blocks replaced by {"type": "image_pil", "image_pil": ...} for vLLM compatibility.

Example

from trl import prepare_multimodal_messages_vllm

# Input: standard TRL format
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": img},
            {"type": "text", "text": "What's in this image?"},
        ],
    }
]

# Output: vLLM-compatible format
vllm_messages = prepare_multimodal_messages_vllm(messages)
# vllm_messages[0]["content"] == [
#     {"type": "image_pil", "image_pil": <PIL.Image>},
#     {"type": "text", "text": "What's in this image?"},
# ]

API

​is_conversational

​Signature

​Parameters

​Returns

​Example

​is_conversational_from_value

​Signature

​Parameters

​Returns

​Example

​apply_chat_template

​Signature

​Parameters

​Returns

​Example

​maybe_apply_chat_template

​Signature

​Parameters

​Returns

​extract_prompt

​Signature

​Parameters

​Returns

​Example

​maybe_extract_prompt

​Signature

​Parameters

​Returns

​pack_dataset

​Signature

​Parameters

​Returns

​Example

​unpair_preference_dataset

​Signature

​Parameters

​Returns

​Example

​maybe_unpair_preference_dataset

​Signature

​Parameters

​Returns

​maybe_convert_to_chatml

​Signature

​Parameters

​Returns

​Example

​prepare_multimodal_messages

​Signature

​Parameters

​Returns

​Example

​prepare_multimodal_messages_vllm

​Signature

​Parameters

​Returns

​Example

Build docs developers (and LLMs) love

is_conversational

Signature

Parameters

Returns

Example

is_conversational_from_value

Signature

Parameters

Returns

Example

apply_chat_template

Signature

Parameters

Returns

Example

maybe_apply_chat_template

Signature

Parameters

Returns

extract_prompt

Signature

Parameters

Returns

Example

maybe_extract_prompt

Signature

Parameters

Returns

pack_dataset

Signature

Parameters

Returns

Example

unpair_preference_dataset

Signature

Parameters

Returns

Example

maybe_unpair_preference_dataset

Signature

Parameters

Returns

maybe_convert_to_chatml

Signature

Parameters

Returns

Example

prepare_multimodal_messages

Signature

Parameters

Returns

Example

prepare_multimodal_messages_vllm

Signature

Parameters

Returns

Example