Skip to main content
The tinker backend is rLLM’s async-first training backend that provides a unified architecture for both agent and workflow training. It’s designed for flexibility and ease of use with built-in support for LoRA and seamless integration with the tinker service.

Overview

tinker backend features:
  • Async-First Design: Native async/await support throughout the training pipeline
  • Unified Architecture: Single codebase for agent and workflow training
  • Service-Based: Uses tinker service for model serving and training
  • Simplified API: Cleaner configuration and easier setup
Python Version: Requires Python >= 3.11 for tinker backend

Installation

Install rLLM with the tinker backend:
uv pip install "rllm[tinker] @ git+https://github.com/rllm-org/rllm.git"

Dependencies

The tinker backend includes (from pyproject.toml):
tinker = [
    "tinker ; python_version >= '3.11'",
    "tinker-cookbook @ git+https://github.com/thinking-machines-lab/tinker-cookbook.git#egg=tinker-cookbook ; python_version >= '3.11'",
]

Basic Usage

Agent Training

Train a math agent with tinker backend:
train_math_tinker.py
import hydra
from omegaconf import DictConfig

from examples.math_tinker.math_agent_with_fewshot import MathAgentWithFewshot
from examples.math_tinker.math_reward import math_reward_fn
from rllm.data.dataset import DatasetRegistry
from rllm.environments.base.single_turn_env import SingleTurnEnvironment
from rllm.trainer import AgentTrainer

@hydra.main(
    version_base=None,
    config_path="../../rllm/trainer/config",
    config_name="tinker_rl_trainer"
)
def main(config: DictConfig):
    # Load datasets
    train_dataset = DatasetRegistry.load_dataset("gsm8k", "train")
    test_dataset = DatasetRegistry.load_dataset("math500", "test")

    # Create trainer with tinker backend
    trainer = AgentTrainer(
        config=config,
        agent_class=MathAgentWithFewshot,
        env_class=SingleTurnEnvironment,
        agent_args={"use_fewshot": True},
        env_args={"reward_fn": math_reward_fn},
        train_dataset=train_dataset,
        val_dataset=test_dataset,
        backend="tinker",  # Specify tinker backend
    )

    # Train
    trainer.train()

if __name__ == "__main__":
    main()
Run with:
python train_math_tinker.py \
  model.name=Qwen/Qwen2.5-Math-7B-Instruct \
  data.train_batch_size=16 \
  training.group_size=16

Workflow Training

Tinker backend also supports workflow-based training:
train_workflow_tinker.py
import hydra
from omegaconf import DictConfig

from examples.solver_judge_tinker.solver_judge_flow import SolverJudgeFlow
from rllm.data.dataset import DatasetRegistry
from rllm.trainer import WorkflowTrainer

@hydra.main(
    version_base=None,
    config_path="../../rllm/trainer/config",
    config_name="tinker_rl_trainer"
)
def main(config: DictConfig):
    train_dataset = DatasetRegistry.load_dataset("countdown", "train")
    test_dataset = DatasetRegistry.load_dataset("countdown", "test")

    trainer = WorkflowTrainer(
        config=config,
        workflow_class=SolverJudgeFlow,
        workflow_args={},
        train_dataset=train_dataset,
        val_dataset=test_dataset,
        backend="tinker",
    )

    trainer.train()

if __name__ == "__main__":
    main()

Configuration

The tinker backend uses tinker_rl_trainer.yaml configuration:

Model Configuration

model.name
string
default:"Qwen/Qwen3-8B"
Model path (HuggingFace or local)
model.lora_rank
integer
default:"32"
LoRA rank (parameter-efficient fine-tuning)
model.train_unembed
boolean
default:"true"
Train LoRA on output embedding layer
model.train_attn
boolean
default:"true"
Train LoRA on attention layers
model.train_mlp
boolean
default:"true"
Train LoRA on MLP layers

Training Configuration

training.group_size
integer
default:"16"
Number of rollouts per prompt (for GRPO)
training.val_group_size
integer
default:"1"
Number of rollouts per validation prompt
training.learning_rate
float
default:"2e-5"
Learning rate for optimizer
training.max_length
integer
default:"32768"
Maximum sequence length (prompt + response)
training.num_minibatches
integer
default:"1"
Number of minibatches per update (currently only 1 is fully tested)

Algorithm Configuration

algorithm.adv_estimator
string
default:"grpo"
Advantage estimator: “grpo”, “reinforce”, or “distill”
algorithm.gamma
float
default:"1.0"
Discount factor for rewards
algorithm.grouping_level
string
default:"trajectory"
Grouping level: “trajectory” or “step”
algorithm.norm_adv_by_std_in_grpo
boolean
default:"false"
Normalize advantages by standard deviation in GRPO

Data Configuration

data.train_batch_size
integer
default:"64"
Training batch size
data.val_batch_size
integer
default:"32"
Validation batch size
data.max_prompt_length
integer
default:"2048"
Maximum prompt length in tokens
data.max_response_length
integer
default:"2048"
Maximum response length in tokens

Trainer Configuration

trainer.total_epochs
integer
default:"10"
Number of training epochs
trainer.test_freq
integer
default:"5"
Validation frequency (in steps)
trainer.save_freq
integer
default:"20"
Checkpoint save frequency (in steps)
trainer.default_local_dir
string
default:"/tmp/rllm-tinker-checkpoints"
Checkpoint directory

LoRA Training

tinker backend has native LoRA support built-in:
# LoRA is enabled by default with rank=32
trainer = AgentTrainer(
    config=config,
    agent_class=MathAgent,
    env_class=SingleTurnEnvironment,
    backend="tinker",
    # ... other args
)
Configure LoRA parameters:
python train_agent.py \
  model.lora_rank=64 \
  model.train_attn=true \
  model.train_mlp=true \
  model.train_unembed=true
Set model.train_unembed=false for Fireworks AI compatibility when deploying LoRA adapters.

Tinker Service

Local Service

By default, tinker backend uses a local service:
tinker_base_url: null  # null means local

Remote Service

Connect to a remote tinker service:
python train_agent.py \
  tinker_base_url=http://remote-server:8080

Sampling Configuration

Configure sampling parameters:
sampling.temperature
float
default:"1.0"
Sampling temperature
sampling.top_p
float
default:"1.0"
Top-p (nucleus) sampling parameter
Important: Setting temperature or top_p away from 1.0 is not recommended by tinker and can cause mysterious issues with logprobs. See tinker-cookbook#86 for discussion.

Rollout Engine Configuration

rollout_engine.reasoning_effort
string
default:"medium"
Reasoning effort level: “low”, “medium”, “high”
rollout_engine.accumulate_reasoning
boolean
default:"false"
Accumulate reasoning tokens across steps
rollout_engine.disable_thinking
boolean
default:"false"
Disable thinking tokens in responses
rollout_engine.bypass_render_with_parser
boolean
default:"false"
Bypass renderer and use parser directly

Checkpointing

tinker backend provides flexible checkpointing:

Automatic Checkpointing

trainer:
  save_freq: 20  # Save every 20 steps
  default_local_dir: /tmp/rllm-tinker-checkpoints

Resume from Checkpoint

Resume from a tinker checkpoint:
python train_agent.py \
  trainer.resume_from_tinker_id=tinker://uuid/weights/000060

Manual Checkpoint Loading

python train_agent.py \
  trainer.default_local_dir=/path/to/checkpoint/dir

Distillation Support

tinker backend supports knowledge distillation from teacher models:
algorithm:
  adv_estimator: distill
  shared_tokenizer: false
  teacher_rollout_args:
    backend: tinker  # or openai
    model: "Qwen/Qwen3-32B"
    base_url: "http://localhost:8000/v1"
    api_key: "EMPTY"
    max_prompt_length: 32768
Run distillation training:
python train_agent.py \
  algorithm.adv_estimator=distill \
  algorithm.teacher_rollout_args.model=Qwen/Qwen3-32B

Advanced Features

Fused Forward-Backward and Optimizer Step

For better performance, tinker can fuse forward-backward pass with optimizer step:
fuse_forward_backward_and_optim_step: true
This optimization reduces overhead by combining gradient computation and parameter updates into a single operation.

Multi-Step Agents

For multi-turn agent interactions:
agent:
  max_steps: 20  # Allow up to 20 turns

Workflow Parallel Tasks

Control parallelism in workflow execution:
workflow:
  n_parallel_tasks: 256  # Run up to 256 tasks in parallel
  retry_limit: 3  # Retry failed tasks up to 3 times

Monitoring

Configure logging backends:
trainer:
  logger: ['console', 'wandb', 'tensorboard']
  project_name: 'rllm-tinker'
  experiment_name: 'math-agent-v1'

Example Configuration

Complete configuration for MATH dataset training:
config.yaml
# Model
model:
  name: "Qwen/Qwen3-8B"
  lora_rank: 32
  train_unembed: true
  train_attn: true
  train_mlp: true

# Training
training:
  group_size: 16
  val_group_size: 1
  learning_rate: 2e-5
  max_length: 32768

# Sampling
sampling:
  temperature: 1.0
  top_p: 1.0

# Algorithm
algorithm:
  adv_estimator: grpo
  gamma: 1.0
  lam: 0.95
  norm_adv_by_std_in_grpo: false
  grouping_level: 'trajectory'

# Data
data:
  train_batch_size: 64
  val_batch_size: 32
  max_prompt_length: 2048
  max_response_length: 2048

# Trainer
trainer:
  total_epochs: 10
  test_freq: 5
  save_freq: 20
  logger: ['console', 'wandb']
  project_name: 'math-rl'
  experiment_name: 'qwen3-8b-gsm8k'
  default_local_dir: '/tmp/rllm-tinker-checkpoints'

# Agent
agent:
  max_steps: 1  # Single-turn
  agent_args: {}

# Environment
env:
  env_args: {}

# Rollout Engine
rollout_engine:
  reasoning_effort: "medium"
  accumulate_reasoning: false
  disable_thinking: false

Performance Optimization

Increase Batch Size

Tune data.train_batch_size and training.group_size for better GPU utilization

Use LoRA

Enable LoRA for faster training and lower memory usage

Fuse Operations

Set fuse_forward_backward_and_optim_step=true for reduced overhead

Parallel Workflows

Increase workflow.n_parallel_tasks for workflow-based training

Troubleshooting

tinker requires Python >= 3.11. Upgrade your Python version:
uv venv --python 3.11
source .venv/bin/activate
uv pip install -e .[tinker]
If you see warnings about temperature or top_p:
sampling:
  temperature: 1.0  # Keep at 1.0
  top_p: 1.0        # Keep at 1.0
Setting these away from 1.0 can cause logprob issues.
Currently only num_minibatches=1 is fully tested:
training:
  num_minibatches: 1  # Don't change this
Ensure the checkpoint directory exists:
mkdir -p /tmp/rllm-tinker-checkpoints
python train_agent.py trainer.default_local_dir=/tmp/rllm-tinker-checkpoints
If using remote service, verify the URL:
curl http://remote-server:8080/health
python train_agent.py tinker_base_url=http://remote-server:8080

Comparison with verl

Key differences from verl backend:
Featuretinkerverl
Python Version>= 3.11>= 3.10
ArchitectureAsync-firstRay-based
LoRA SupportNativeVia config
VLM SupportLimitedFull (Qwen2-VL, Qwen3-VL)
Distributed TrainingLimitedMulti-node Ray
ConfigurationSimplerMore complex
Service Modeltinker servicevLLM/SGLang
See Backend Comparison for detailed feature comparison.

See Also

verl Backend

Distributed training with verl

Backend Comparison

Compare tinker vs verl features

tinker Cookbook

Official tinker cookbook repository

Agent Trainer

Learn about AgentTrainer API

Build docs developers (and LLMs) love