trl.chat_template_utils module provides helpers for working with chat templates across models and tokenizers. The key utilities cover cloning a template from one tokenizer to another, attaching a response schema for structured output parsing, and obtaining a training-compatible (prefix-preserving) variant of a template.
add_response_schema
Attaches the appropriate response schema to a tokenizer based on its chat template, enabling structured parsing of assistant outputs withtokenizer.parse_response().
At the time of writing, most tokenizers do not ship with a built-in response schema. This utility manually sets the response_schema attribute for the known Qwen3 and Qwen3.5 chat templates.
Signature
Parameters
Tokenizer whose
response_schema attribute will be set. Must use a recognized chat template (Qwen3 or Qwen3.5).Returns
PreTrainedTokenizer — The same tokenizer object with response_schema set in-place.
Example
clone_chat_template
Copies the chat template and special tokens from a source tokenizer onto a target tokenizer, then resizes the model’s token embeddings to match the new vocabulary. The function:- Copies
chat_templatefrom the source tokenizer. - Adds any tokens present in the source but absent in the target.
- Sets and synchronizes the EOS token across tokenizer and model (including
generation_config.eos_token_id). - Resizes model embeddings to the new vocabulary size, optionally rounding up to a multiple of
resize_to_multiple_of. - Adds dummy
<extra_id_N>tokens when the embedding matrix is larger than the vocabulary after rounding.
Signature
Parameters
Model whose token embeddings will be resized to accommodate the new tokens.
Target tokenizer that will receive the chat template and special tokens.
HuggingFace Hub identifier or local path of the tokenizer to clone the chat template from.
Round up the new embedding vocabulary size to the nearest multiple of this value. Set to
None to disable rounding.Returns
A tuple of three values:Updated model with resized token embeddings and EOS token configured.
Updated tokenizer with the cloned chat template and special tokens.
Token IDs of all tokens that were added to the tokenizer (from the source tokenizer, plus any
<extra_id_N> padding tokens).Example
get_training_chat_template
Returns a prefix-preserving variant of the tokenizer’s chat template suitable for training, orNone if the existing template is already prefix-preserving.
A template is prefix-preserving if applying it to a conversation up to turn always yields a string that is a prefix of the result for the full conversation. This property is required for correct loss masking during supervised fine-tuning.
Currently, Qwen3 and Qwen3.5 tokenizers are known to have chat templates that are not prefix-preserving in their default form (the <think> block is omitted for non-final assistant turns). This function returns a patched version that forces the thinking block to always appear, making the template prefix-preserving.
Signature
Parameters
Tokenizer instance to check and potentially patch.
Returns
str | None — A training-compatible chat template string if patching is needed, or None if the existing template is already prefix-preserving.