tinker Backend

The tinker backend is rLLM’s async-first training backend that provides a unified architecture for both agent and workflow training. It’s designed for flexibility and ease of use with built-in support for LoRA and seamless integration with the tinker service.

Overview

tinker backend features:

Async-First Design: Native async/await support throughout the training pipeline
Unified Architecture: Single codebase for agent and workflow training
Service-Based: Uses tinker service for model serving and training
Simplified API: Cleaner configuration and easier setup

Python Version: Requires Python >= 3.11 for tinker backend

Installation

Install rLLM with the tinker backend:

uv pip install "rllm[tinker] @ git+https://github.com/rllm-org/rllm.git"

Dependencies

The tinker backend includes (from pyproject.toml):

tinker = [
    "tinker ; python_version >= '3.11'",
    "tinker-cookbook @ git+https://github.com/thinking-machines-lab/tinker-cookbook.git#egg=tinker-cookbook ; python_version >= '3.11'",
]

Basic Usage

Agent Training

Train a math agent with tinker backend:

train_math_tinker.py

import hydra
from omegaconf import DictConfig

from examples.math_tinker.math_agent_with_fewshot import MathAgentWithFewshot
from examples.math_tinker.math_reward import math_reward_fn
from rllm.data.dataset import DatasetRegistry
from rllm.environments.base.single_turn_env import SingleTurnEnvironment
from rllm.trainer import AgentTrainer

@hydra.main(
    version_base=None,
    config_path="../../rllm/trainer/config",
    config_name="tinker_rl_trainer"
)
def main(config: DictConfig):
    # Load datasets
    train_dataset = DatasetRegistry.load_dataset("gsm8k", "train")
    test_dataset = DatasetRegistry.load_dataset("math500", "test")

    # Create trainer with tinker backend
    trainer = AgentTrainer(
        config=config,
        agent_class=MathAgentWithFewshot,
        env_class=SingleTurnEnvironment,
        agent_args={"use_fewshot": True},
        env_args={"reward_fn": math_reward_fn},
        train_dataset=train_dataset,
        val_dataset=test_dataset,
        backend="tinker",  # Specify tinker backend
    )

    # Train
    trainer.train()

if __name__ == "__main__":
    main()

Run with:

python train_math_tinker.py \
  model.name=Qwen/Qwen2.5-Math-7B-Instruct \
  data.train_batch_size=16 \
  training.group_size=16

Workflow Training

Tinker backend also supports workflow-based training:

train_workflow_tinker.py

import hydra
from omegaconf import DictConfig

from examples.solver_judge_tinker.solver_judge_flow import SolverJudgeFlow
from rllm.data.dataset import DatasetRegistry
from rllm.trainer import WorkflowTrainer

@hydra.main(
    version_base=None,
    config_path="../../rllm/trainer/config",
    config_name="tinker_rl_trainer"
)
def main(config: DictConfig):
    train_dataset = DatasetRegistry.load_dataset("countdown", "train")
    test_dataset = DatasetRegistry.load_dataset("countdown", "test")

    trainer = WorkflowTrainer(
        config=config,
        workflow_class=SolverJudgeFlow,
        workflow_args={},
        train_dataset=train_dataset,
        val_dataset=test_dataset,
        backend="tinker",
    )

    trainer.train()

if __name__ == "__main__":
    main()

Configuration

The tinker backend uses tinker_rl_trainer.yaml configuration:

Model Configuration

model.name

string

default:"Qwen/Qwen3-8B"

Model path (HuggingFace or local)

model.lora_rank

integer

default:"32"

LoRA rank (parameter-efficient fine-tuning)

model.train_unembed

boolean

default:"true"

Train LoRA on output embedding layer

model.train_attn

boolean

default:"true"

Train LoRA on attention layers

model.train_mlp

boolean

default:"true"

Train LoRA on MLP layers

Training Configuration

training.group_size

integer

default:"16"

Number of rollouts per prompt (for GRPO)

training.val_group_size

integer

default:"1"

Number of rollouts per validation prompt

training.learning_rate

float

default:"2e-5"

Learning rate for optimizer

training.max_length

integer

default:"32768"

Maximum sequence length (prompt + response)

training.num_minibatches

integer

default:"1"

Number of minibatches per update (currently only 1 is fully tested)

Algorithm Configuration

algorithm.adv_estimator

string

default:"grpo"

Advantage estimator: “grpo”, “reinforce”, or “distill”

algorithm.gamma

float

default:"1.0"

Discount factor for rewards

algorithm.grouping_level

string

default:"trajectory"

Grouping level: “trajectory” or “step”

algorithm.norm_adv_by_std_in_grpo

boolean

default:"false"

Normalize advantages by standard deviation in GRPO

Data Configuration

data.train_batch_size

integer

default:"64"

Training batch size

data.val_batch_size

integer

default:"32"

Validation batch size

data.max_prompt_length

integer

default:"2048"

Maximum prompt length in tokens

data.max_response_length

integer

default:"2048"

Maximum response length in tokens

Trainer Configuration

trainer.total_epochs

integer

default:"10"

Number of training epochs

trainer.test_freq

integer

default:"5"

Validation frequency (in steps)

trainer.save_freq

integer

default:"20"

Checkpoint save frequency (in steps)

trainer.default_local_dir

string

default:"/tmp/rllm-tinker-checkpoints"

Checkpoint directory

LoRA Training

tinker backend has native LoRA support built-in:

# LoRA is enabled by default with rank=32
trainer = AgentTrainer(
    config=config,
    agent_class=MathAgent,
    env_class=SingleTurnEnvironment,
    backend="tinker",
    # ... other args
)

Configure LoRA parameters:

python train_agent.py \
  model.lora_rank=64 \
  model.train_attn=true \
  model.train_mlp=true \
  model.train_unembed=true

Set model.train_unembed=false for Fireworks AI compatibility when deploying LoRA adapters.

Tinker Service

Local Service

By default, tinker backend uses a local service:

tinker_base_url: null  # null means local

Remote Service

Connect to a remote tinker service:

python train_agent.py \
  tinker_base_url=http://remote-server:8080

Sampling Configuration

Configure sampling parameters:

sampling.temperature

float

default:"1.0"

Sampling temperature

sampling.top_p

float

default:"1.0"

Top-p (nucleus) sampling parameter

Important: Setting temperature or top_p away from 1.0 is not recommended by tinker and can cause mysterious issues with logprobs. See tinker-cookbook#86 for discussion.

Rollout Engine Configuration

rollout_engine.reasoning_effort

string

default:"medium"

Reasoning effort level: “low”, “medium”, “high”

rollout_engine.accumulate_reasoning

boolean

default:"false"

Accumulate reasoning tokens across steps

rollout_engine.disable_thinking

boolean

default:"false"

Disable thinking tokens in responses

rollout_engine.bypass_render_with_parser

boolean

default:"false"

Bypass renderer and use parser directly

Checkpointing

tinker backend provides flexible checkpointing:

Automatic Checkpointing

trainer:
  save_freq: 20  # Save every 20 steps
  default_local_dir: /tmp/rllm-tinker-checkpoints

Resume from Checkpoint

Resume from a tinker checkpoint:

python train_agent.py \
  trainer.resume_from_tinker_id=tinker://uuid/weights/000060

Manual Checkpoint Loading

python train_agent.py \
  trainer.default_local_dir=/path/to/checkpoint/dir

Distillation Support

tinker backend supports knowledge distillation from teacher models:

algorithm:
  adv_estimator: distill
  shared_tokenizer: false
  teacher_rollout_args:
    backend: tinker  # or openai
    model: "Qwen/Qwen3-32B"
    base_url: "http://localhost:8000/v1"
    api_key: "EMPTY"
    max_prompt_length: 32768

Run distillation training:

python train_agent.py \
  algorithm.adv_estimator=distill \
  algorithm.teacher_rollout_args.model=Qwen/Qwen3-32B

Advanced Features

Fused Forward-Backward and Optimizer Step

For better performance, tinker can fuse forward-backward pass with optimizer step:

fuse_forward_backward_and_optim_step: true

This optimization reduces overhead by combining gradient computation and parameter updates into a single operation.

Multi-Step Agents

For multi-turn agent interactions:

agent:
  max_steps: 20  # Allow up to 20 turns

Workflow Parallel Tasks

Control parallelism in workflow execution:

workflow:
  n_parallel_tasks: 256  # Run up to 256 tasks in parallel
  retry_limit: 3  # Retry failed tasks up to 3 times

Monitoring

Configure logging backends:

trainer:
  logger: ['console', 'wandb', 'tensorboard']
  project_name: 'rllm-tinker'
  experiment_name: 'math-agent-v1'

Example Configuration

Complete configuration for MATH dataset training:

config.yaml

# Model
model:
  name: "Qwen/Qwen3-8B"
  lora_rank: 32
  train_unembed: true
  train_attn: true
  train_mlp: true

# Training
training:
  group_size: 16
  val_group_size: 1
  learning_rate: 2e-5
  max_length: 32768

# Sampling
sampling:
  temperature: 1.0
  top_p: 1.0

# Algorithm
algorithm:
  adv_estimator: grpo
  gamma: 1.0
  lam: 0.95
  norm_adv_by_std_in_grpo: false
  grouping_level: 'trajectory'

# Data
data:
  train_batch_size: 64
  val_batch_size: 32
  max_prompt_length: 2048
  max_response_length: 2048

# Trainer
trainer:
  total_epochs: 10
  test_freq: 5
  save_freq: 20
  logger: ['console', 'wandb']
  project_name: 'math-rl'
  experiment_name: 'qwen3-8b-gsm8k'
  default_local_dir: '/tmp/rllm-tinker-checkpoints'

# Agent
agent:
  max_steps: 1  # Single-turn
  agent_args: {}

# Environment
env:
  env_args: {}

# Rollout Engine
rollout_engine:
  reasoning_effort: "medium"
  accumulate_reasoning: false
  disable_thinking: false

Performance Optimization

Increase Batch Size

Tune data.train_batch_size and training.group_size for better GPU utilization

Use LoRA

Enable LoRA for faster training and lower memory usage

Fuse Operations

Set fuse_forward_backward_and_optim_step=true for reduced overhead

Parallel Workflows

Increase workflow.n_parallel_tasks for workflow-based training

Troubleshooting

Python Version Error

tinker requires Python >= 3.11. Upgrade your Python version:

uv venv --python 3.11
source .venv/bin/activate
uv pip install -e .[tinker]

Sampling Parameter Warning

If you see warnings about temperature or top_p:

sampling:
  temperature: 1.0  # Keep at 1.0
  top_p: 1.0        # Keep at 1.0

Setting these away from 1.0 can cause logprob issues.

Minibatch Warning

Currently only num_minibatches=1 is fully tested:

training:
  num_minibatches: 1  # Don't change this

Checkpoint Not Found

Ensure the checkpoint directory exists:

mkdir -p /tmp/rllm-tinker-checkpoints
python train_agent.py trainer.default_local_dir=/tmp/rllm-tinker-checkpoints

Tinker Service Connection Failed

If using remote service, verify the URL:

curl http://remote-server:8080/health
python train_agent.py tinker_base_url=http://remote-server:8080

Comparison with verl

Key differences from verl backend:

Feature	tinker	verl
Python Version	>= 3.11	>= 3.10
Architecture	Async-first	Ray-based
LoRA Support	Native	Via config
VLM Support	Limited	Full (Qwen2-VL, Qwen3-VL)
Distributed Training	Limited	Multi-node Ray
Configuration	Simpler	More complex
Service Model	tinker service	vLLM/SGLang

See Backend Comparison for detailed feature comparison.

verl Backend

Distributed training with verl

Backend Comparison

Compare tinker vs verl features

tinker Cookbook

Official tinker cookbook repository

Agent Trainer

Learn about AgentTrainer API

Get Started

Core Concepts

SDK

Training Backends

Guides

​Overview

​Installation

​Dependencies

​Basic Usage

​Agent Training

​Workflow Training

​Configuration

​Model Configuration

​Training Configuration

​Algorithm Configuration

​Data Configuration

​Trainer Configuration

​LoRA Training

​Tinker Service

​Local Service

​Remote Service

​Sampling Configuration

​Rollout Engine Configuration

​Checkpointing

​Automatic Checkpointing

​Resume from Checkpoint

​Manual Checkpoint Loading

​Distillation Support

​Advanced Features

​Fused Forward-Backward and Optimizer Step

​Multi-Step Agents

​Workflow Parallel Tasks

​Monitoring

​Example Configuration

​Performance Optimization

Increase Batch Size

Use LoRA

Fuse Operations

Parallel Workflows

​Troubleshooting

​Comparison with verl

​See Also

verl Backend

Backend Comparison

tinker Cookbook

Agent Trainer

Build docs developers (and LLMs) love

Overview

Installation

Dependencies

Basic Usage

Agent Training

Workflow Training

Configuration

Model Configuration

Training Configuration

Algorithm Configuration

Data Configuration

Trainer Configuration

LoRA Training

Tinker Service

Local Service

Remote Service

Sampling Configuration

Rollout Engine Configuration

Checkpointing

Automatic Checkpointing

Resume from Checkpoint

Manual Checkpoint Loading

Distillation Support

Advanced Features

Fused Forward-Backward and Optimizer Step

Multi-Step Agents

Workflow Parallel Tasks

Monitoring

Example Configuration

Performance Optimization

Troubleshooting

Comparison with verl

See Also