Chat Template Utilities

The trl.chat_template_utils module provides helpers for working with chat templates across models and tokenizers. The key utilities cover cloning a template from one tokenizer to another, attaching a response schema for structured output parsing, and obtaining a training-compatible (prefix-preserving) variant of a template.

from trl import clone_chat_template
from trl.chat_template_utils import (
    add_response_schema,
    get_training_chat_template,
    is_chat_template_prefix_preserving,
)

add_response_schema

Attaches the appropriate response schema to a tokenizer based on its chat template, enabling structured parsing of assistant outputs with tokenizer.parse_response(). At the time of writing, most tokenizers do not ship with a built-in response schema. This utility manually sets the response_schema attribute for the known Qwen3 and Qwen3.5 chat templates.

If the tokenizer’s chat template is not recognized (currently only Qwen3 and Qwen3.5 are supported), a ValueError is raised. For other templates, set tokenizer.response_schema manually following the Transformers response parsing docs.

Signature

def add_response_schema(tokenizer: PreTrainedTokenizer) -> PreTrainedTokenizer

Parameters

tokenizer

PreTrainedTokenizer

Tokenizer whose response_schema attribute will be set. Must use a recognized chat template (Qwen3 or Qwen3.5).

Returns

PreTrainedTokenizer — The same tokenizer object with response_schema set in-place.

Example

from transformers import AutoTokenizer
from trl.chat_template_utils import add_response_schema

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer = add_response_schema(tokenizer)

assistant_text = (
    '<tool_call>\n{"name": "multiply", "arguments": {"a": 3, "b": 4}}\n'
    '</tool_call><|im_end|>'
)
print(tokenizer.parse_response(assistant_text))
# {'role': 'assistant', 'content': '',
#  'tool_calls': [{'type': 'function',
#                  'function': {'name': 'multiply', 'arguments': {'a': 3, 'b': 4}}}]}

clone_chat_template

Copies the chat template and special tokens from a source tokenizer onto a target tokenizer, then resizes the model’s token embeddings to match the new vocabulary. The function:

Copies chat_template from the source tokenizer.
Adds any tokens present in the source but absent in the target.
Sets and synchronizes the EOS token across tokenizer and model (including generation_config.eos_token_id).
Resizes model embeddings to the new vocabulary size, optionally rounding up to a multiple of resize_to_multiple_of.
Adds dummy <extra_id_N> tokens when the embedding matrix is larger than the vocabulary after rounding.

This is useful when you want to train a base model (e.g., Llama-3.2-1B) using the chat template and special tokens of another model family (e.g., Qwen3-0.6B) without performing a full tokenizer replacement.

Signature

def clone_chat_template(
    model: PreTrainedModel,
    tokenizer: PreTrainedTokenizer,
    source_tokenizer_path: str,
    resize_to_multiple_of: int | None = 64,
) -> tuple[PreTrainedModel, PreTrainedTokenizer, list[int]]

Parameters

model

PreTrainedModel

Model whose token embeddings will be resized to accommodate the new tokens.

tokenizer

PreTrainedTokenizer

Target tokenizer that will receive the chat template and special tokens.

source_tokenizer_path

str

HuggingFace Hub identifier or local path of the tokenizer to clone the chat template from.

resize_to_multiple_of

int

default:"64"

Round up the new embedding vocabulary size to the nearest multiple of this value. Set to None to disable rounding.

Returns

A tuple of three values:

model

PreTrainedModel

Updated model with resized token embeddings and EOS token configured.

tokenizer

PreTrainedTokenizer

Updated tokenizer with the cloned chat template and special tokens.

added_tokens

list[int]

Token IDs of all tokens that were added to the tokenizer (from the source tokenizer, plus any <extra_id_N> padding tokens).

Example

from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import clone_chat_template

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

model, tokenizer, added_tokens = clone_chat_template(
    model,
    tokenizer,
    source_tokenizer_path="Qwen/Qwen3-0.6B",
    resize_to_multiple_of=64,
)
print(f"Added {len(added_tokens)} new tokens")

get_training_chat_template

Returns a prefix-preserving variant of the tokenizer’s chat template suitable for training, or None if the existing template is already prefix-preserving. A template is prefix-preserving if applying it to a conversation up to turn

N

always yields a string that is a prefix of the result for the full conversation. This property is required for correct loss masking during supervised fine-tuning. Currently, Qwen3 and Qwen3.5 tokenizers are known to have chat templates that are not prefix-preserving in their default form (the <think> block is omitted for non-final assistant turns). This function returns a patched version that forces the thinking block to always appear, making the template prefix-preserving.

If the tokenizer’s template is not prefix-preserving and is not one of the supported families (Qwen3, Qwen3.5), a ValueError is raised. You must manually patch the template in that case.

Signature

def get_training_chat_template(tokenizer: PreTrainedTokenizer) -> str | None

Parameters

tokenizer

PreTrainedTokenizer

Tokenizer instance to check and potentially patch.

Returns

str | None — A training-compatible chat template string if patching is needed, or None if the existing template is already prefix-preserving.

Example

from transformers import AutoTokenizer
from trl.chat_template_utils import get_training_chat_template

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")

messages1 = [
    {"role": "user", "content": "What color is the sky?"},
    {"role": "assistant", "content": "It is blue."},
]
messages2 = [
    {"role": "user", "content": "What color is the sky?"},
    {"role": "assistant", "content": "It is blue."},
    {"role": "user", "content": "And at night?"},
]

# Default template is NOT prefix-preserving for Qwen3
chat_template = get_training_chat_template(tokenizer)
if chat_template is not None:
    # Apply the patched template for training
    text1 = tokenizer.apply_chat_template(messages1, tokenize=False, chat_template=chat_template)
    text2 = tokenizer.apply_chat_template(messages2, tokenize=False, chat_template=chat_template)
    # Now text2.startswith(text1) is True
    print(text2.startswith(text1))  # True

API

add_response_schema

Signature

Parameters

Returns

Example

clone_chat_template

Signature

Parameters

Returns

Example

get_training_chat_template

Signature

Parameters

Returns

Example

Build docs developers (and LLMs) love

API

​add_response_schema

​Signature

​Parameters

​Returns

​Example

​clone_chat_template

​Signature

​Parameters

​Returns

​Example

​get_training_chat_template

​Signature

​Parameters

​Returns

​Example

Build docs developers (and LLMs) love

add_response_schema

Signature

Parameters

Returns

Example

clone_chat_template

Signature

Parameters

Returns

Example

get_training_chat_template

Signature

Parameters

Returns

Example