Skip to main content
The trl.chat_template_utils module provides helpers for working with chat templates across models and tokenizers. The key utilities cover cloning a template from one tokenizer to another, attaching a response schema for structured output parsing, and obtaining a training-compatible (prefix-preserving) variant of a template.
from trl import clone_chat_template
from trl.chat_template_utils import (
    add_response_schema,
    get_training_chat_template,
    is_chat_template_prefix_preserving,
)

add_response_schema

Attaches the appropriate response schema to a tokenizer based on its chat template, enabling structured parsing of assistant outputs with tokenizer.parse_response(). At the time of writing, most tokenizers do not ship with a built-in response schema. This utility manually sets the response_schema attribute for the known Qwen3 and Qwen3.5 chat templates.
If the tokenizer’s chat template is not recognized (currently only Qwen3 and Qwen3.5 are supported), a ValueError is raised. For other templates, set tokenizer.response_schema manually following the Transformers response parsing docs.

Signature

def add_response_schema(tokenizer: PreTrainedTokenizer) -> PreTrainedTokenizer

Parameters

tokenizer
PreTrainedTokenizer
Tokenizer whose response_schema attribute will be set. Must use a recognized chat template (Qwen3 or Qwen3.5).

Returns

PreTrainedTokenizer — The same tokenizer object with response_schema set in-place.

Example

from transformers import AutoTokenizer
from trl.chat_template_utils import add_response_schema

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer = add_response_schema(tokenizer)

assistant_text = (
    '<tool_call>\n{"name": "multiply", "arguments": {"a": 3, "b": 4}}\n'
    '</tool_call><|im_end|>'
)
print(tokenizer.parse_response(assistant_text))
# {'role': 'assistant', 'content': '',
#  'tool_calls': [{'type': 'function',
#                  'function': {'name': 'multiply', 'arguments': {'a': 3, 'b': 4}}}]}

clone_chat_template

Copies the chat template and special tokens from a source tokenizer onto a target tokenizer, then resizes the model’s token embeddings to match the new vocabulary. The function:
  1. Copies chat_template from the source tokenizer.
  2. Adds any tokens present in the source but absent in the target.
  3. Sets and synchronizes the EOS token across tokenizer and model (including generation_config.eos_token_id).
  4. Resizes model embeddings to the new vocabulary size, optionally rounding up to a multiple of resize_to_multiple_of.
  5. Adds dummy <extra_id_N> tokens when the embedding matrix is larger than the vocabulary after rounding.
This is useful when you want to train a base model (e.g., Llama-3.2-1B) using the chat template and special tokens of another model family (e.g., Qwen3-0.6B) without performing a full tokenizer replacement.

Signature

def clone_chat_template(
    model: PreTrainedModel,
    tokenizer: PreTrainedTokenizer,
    source_tokenizer_path: str,
    resize_to_multiple_of: int | None = 64,
) -> tuple[PreTrainedModel, PreTrainedTokenizer, list[int]]

Parameters

model
PreTrainedModel
Model whose token embeddings will be resized to accommodate the new tokens.
tokenizer
PreTrainedTokenizer
Target tokenizer that will receive the chat template and special tokens.
source_tokenizer_path
str
HuggingFace Hub identifier or local path of the tokenizer to clone the chat template from.
resize_to_multiple_of
int
default:"64"
Round up the new embedding vocabulary size to the nearest multiple of this value. Set to None to disable rounding.

Returns

A tuple of three values:
model
PreTrainedModel
Updated model with resized token embeddings and EOS token configured.
tokenizer
PreTrainedTokenizer
Updated tokenizer with the cloned chat template and special tokens.
added_tokens
list[int]
Token IDs of all tokens that were added to the tokenizer (from the source tokenizer, plus any <extra_id_N> padding tokens).

Example

from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import clone_chat_template

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

model, tokenizer, added_tokens = clone_chat_template(
    model,
    tokenizer,
    source_tokenizer_path="Qwen/Qwen3-0.6B",
    resize_to_multiple_of=64,
)
print(f"Added {len(added_tokens)} new tokens")

get_training_chat_template

Returns a prefix-preserving variant of the tokenizer’s chat template suitable for training, or None if the existing template is already prefix-preserving. A template is prefix-preserving if applying it to a conversation up to turn NN always yields a string that is a prefix of the result for the full conversation. This property is required for correct loss masking during supervised fine-tuning. Currently, Qwen3 and Qwen3.5 tokenizers are known to have chat templates that are not prefix-preserving in their default form (the <think> block is omitted for non-final assistant turns). This function returns a patched version that forces the thinking block to always appear, making the template prefix-preserving.
If the tokenizer’s template is not prefix-preserving and is not one of the supported families (Qwen3, Qwen3.5), a ValueError is raised. You must manually patch the template in that case.

Signature

def get_training_chat_template(tokenizer: PreTrainedTokenizer) -> str | None

Parameters

tokenizer
PreTrainedTokenizer
Tokenizer instance to check and potentially patch.

Returns

str | None — A training-compatible chat template string if patching is needed, or None if the existing template is already prefix-preserving.

Example

from transformers import AutoTokenizer
from trl.chat_template_utils import get_training_chat_template

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")

messages1 = [
    {"role": "user", "content": "What color is the sky?"},
    {"role": "assistant", "content": "It is blue."},
]
messages2 = [
    {"role": "user", "content": "What color is the sky?"},
    {"role": "assistant", "content": "It is blue."},
    {"role": "user", "content": "And at night?"},
]

# Default template is NOT prefix-preserving for Qwen3
chat_template = get_training_chat_template(tokenizer)
if chat_template is not None:
    # Apply the patched template for training
    text1 = tokenizer.apply_chat_template(messages1, tokenize=False, chat_template=chat_template)
    text2 = tokenizer.apply_chat_template(messages2, tokenize=False, chat_template=chat_template)
    # Now text2.startswith(text1) is True
    print(text2.startswith(text1))  # True

Build docs developers (and LLMs) love