Skip to main content
rLLM supports two training backends: verl and tinker. This guide helps you choose the right backend for your project.

Quick Comparison

verl

Distributed, production-ready backend for large-scale training

tinker

Async-first backend for flexible and rapid development

Feature Comparison

Featureverltinker
Python Version>= 3.10>= 3.11
ArchitectureRay-based distributedAsync-first service-based
Multi-GPU✅ Full support⚠️ Limited
Multi-Node✅ Full support❌ Not supported
LoRA✅ Via configuration✅ Native support
VLM Support✅ Qwen2-VL, Qwen3-VL⚠️ Limited
Distributed Training✅ FSDP, tensor parallel⚠️ Single node
Inference EnginevLLM, SGLangtinker service
ConfigurationComplex (Hydra + verl)Simple (Hydra)
Learning CurveSteeperGentler
Async SupportBuilt-inNative
CheckpointingAdvanced (Ray)Standard
Resource ManagementRay resource poolsService-based
Production Ready✅ Yes⚠️ Development

Detailed Comparison

Architecture

Ray-Based Distributed Systemverl uses Ray for orchestrating distributed worker groups:
┌─────────────────────────────────────────┐
│          Ray Cluster                    │
│                                         │
│  ┌──────────────┐  ┌──────────────┐   │
│  │ Actor-Rollout│  │    Critic    │   │
│  │   Workers    │  │   Workers    │   │
│  └──────────────┘  └──────────────┘   │
│                                         │
│  ┌──────────────┐  ┌──────────────┐   │
│  │  Reference   │  │    vLLM/     │   │
│  │   Policy     │  │   SGLang     │   │
│  └──────────────┘  └──────────────┘   │
└─────────────────────────────────────────┘
Key Components:
  • Actor-Rollout Workers: Combined training and generation
  • Critic Workers: Value function estimation
  • Reference Policy: Frozen policy for KL divergence
  • Hybrid Engine: Efficient async trajectory generation
Use Cases:
  • Large-scale distributed training
  • Multi-node GPU clusters
  • Production deployments
  • Vision-language models

Installation & Dependencies

# Python >= 3.10
uv pip install "rllm[verl] @ git+https://github.com/rllm-org/rllm.git"

# Dependencies:
# - verl==0.6.1
# - vllm>=0.10.2,<=0.11.0
# - torch>=2.8.0
# - flash-attn>=2.8.1
# - qwen-vl-utils (for VLM)

Configuration Complexity

More Complex Configurationverl requires configuring Ray resources, worker groups, and FSDP:
actor_rollout_ref:
  model:
    path: Qwen/Qwen2.5-Math-7B-Instruct
    lora:
      rank: 64
      alpha: 128
  actor:
    fsdp_config:
      param_offload: false
      grad_offload: false
  rollout:
    mode: async  # Required
    n: 16
  
resource_pool_config:
  actor_rollout_gpu: 4
  critic_gpu: 2
  ref_policy_gpu: 2

data:
  train_batch_size: 32
  max_prompt_length: 2048
  max_response_length: 2048

algorithm:
  adv_estimator: grpo
  gamma: 1.0

trainer:
  total_epochs: 3
  save_freq: 100
Pros:
  • Fine-grained control over resources
  • Advanced features (FSDP, tensor parallel)
  • Production-tested configurations
Cons:
  • Steeper learning curve
  • More configuration options
  • Requires Ray knowledge

LoRA Support

Configuration-Based LoRA
actor_rollout_ref:
  model:
    path: Qwen/Qwen2.5-Math-7B-Instruct
    lora:
      rank: 64
      alpha: 128
      target_modules:
        - q_proj
        - k_proj
        - v_proj
        - o_proj
        - gate_proj
        - up_proj
        - down_proj
python train_agent.py \
  actor_rollout_ref.model.lora.rank=64 \
  actor_rollout_ref.model.lora.alpha=128
Features:
  • Full control over target modules
  • Integrated with FSDP
  • Reference policy without LoRA

Vision-Language Models (VLM)

Full VLM Supportverl supports Qwen2-VL and Qwen3-VL with multimodal processing:
import hydra
from rllm.trainer.agent_trainer import AgentTrainer

@hydra.main(
    config_path="pkg://rllm.trainer.config",
    config_name="agent_ppo_trainer",
    version_base=None
)
def main(config):
    trainer = AgentTrainer(
        workflow_class=Geo3KWorkflow,
        workflow_args={"reward_function": f1_reward_fn},
        config=config,
        train_dataset=train_dataset,
        val_dataset=test_dataset,
    )
    trainer.train()
python train_vlm.py \
  actor_rollout_ref.model.path=Qwen/Qwen2-VL-7B-Instruct \
  data.return_multi_modal_inputs=true
Supported Models:
  • Qwen2-VL-7B-Instruct
  • Qwen2-VL-72B-Instruct
  • Qwen3-VL models
Features:
  • Image grid position IDs
  • Multimodal processors
  • Vision-aware tokenization

Distributed Training

Full Distributed Supportverl supports multi-GPU and multi-node training:
# Multi-GPU on single node
python train_agent.py \
  resource_pool_config.actor_rollout_gpu=8 \
  actor_rollout_ref.actor.fsdp_config.param_offload=false

# Multi-node cluster
ray start --head --port=6379
# On other nodes:
ray start --address=head-node-ip:6379

python train_agent.py \
  resource_pool_config.actor_rollout_gpu=32
Features:
  • FSDP (Fully Sharded Data Parallel)
  • Tensor parallelism via vLLM
  • Resource pool management
  • Ray cluster orchestration
Resource Configuration:
resource_pool_config:
  actor_rollout_gpu: 8  # Actor-rollout workers
  critic_gpu: 2         # Critic workers
  ref_policy_gpu: 2     # Reference policy workers

When to Use Each Backend

Use verl When:

  • Training on multiple GPUs or nodes
  • Production deployments requiring reliability
  • Large models (> 7B parameters) needing FSDP
  • High-throughput training pipelines
  • Training Qwen2-VL or Qwen3-VL models
  • Multimodal agent training
  • Image-based reasoning tasks
  • OCR and visual question answering
  • Custom advantage estimators
  • Critic network training
  • Reference policy with KL divergence
  • Complex reward shaping
  • Multi-node GPU clusters
  • Tensor parallel inference
  • Memory-constrained large models
  • High-throughput rollout generation

Use tinker When:

  • Quick experiments and iteration
  • Testing new agent architectures
  • Developing custom workflows
  • Learning rLLM framework
  • Parameter-efficient fine-tuning
  • Limited GPU memory (single GPU)
  • Fast adaptation of pretrained models
  • Deployment to Fireworks AI
  • Training on a single machine
  • Small to medium models (< 7B)
  • Development environments
  • Limited computational resources
  • Building custom agent workflows
  • Multi-step reasoning tasks
  • Tool-using agents
  • Async-first architectures

Performance Characteristics

Training Speed

Metricverltinker
Single GPUFastFast
Multi-GPUVery Fast (scaling)Not supported
Startup TimeSlower (Ray init)Faster
ThroughputHigh (distributed)Medium (single node)
Memory EfficiencyHigh (FSDP)Medium

Resource Requirements

Minimum Requirements:
  • 1 GPU with 24GB+ VRAM (for 7B models)
  • 32GB+ system RAM
  • Python >= 3.10
  • CUDA 11.8+ or 12.1+
Recommended for Production:
  • 4-8 GPUs (A100 or H100)
  • 128GB+ system RAM
  • NVMe storage for checkpoints
  • Multi-node Ray cluster
Memory Usage (7B model, batch_size=32):
  • Full fine-tuning: ~40GB VRAM
  • LoRA (rank=64): ~28GB VRAM
  • With FSDP: ~20GB per GPU (4 GPUs)

Migration Between Backends

From tinker to verl

1

Update Configuration

Convert tinker config to verl format:
- model:
-   name: "Qwen/Qwen2.5-Math-7B-Instruct"
-   lora_rank: 32
+ actor_rollout_ref:
+   model:
+     path: "Qwen/Qwen2.5-Math-7B-Instruct"
+     lora:
+       rank: 32
2

Update Training Script

Change backend parameter:
trainer = AgentTrainer(
    config=config,
    agent_class=MathAgent,
    env_class=SingleTurnEnvironment,
    backend="verl",  # Changed from "tinker"
    # ...
)
3

Install verl Backend

uv pip install -e .[verl]
4

Update Hydra Config

Use verl config file:
@hydra.main(
    config_path="pkg://rllm.trainer.config",
    config_name="agent_ppo_trainer",  # verl config
    version_base=None
)

From verl to tinker

1

Check Python Version

Ensure Python >= 3.11:
python --version  # Must be 3.11+
uv venv --python 3.11
2

Simplify Configuration

Convert verl config to tinker format:
- actor_rollout_ref:
-   model:
-     path: "Qwen/Qwen2.5-Math-7B-Instruct"
-     lora:
-       rank: 32
+ model:
+   name: "Qwen/Qwen2.5-Math-7B-Instruct"
+   lora_rank: 32
3

Update Training Script

trainer = AgentTrainer(
    config=config,
    agent_class=MathAgent,
    env_class=SingleTurnEnvironment,
    backend="tinker",  # Changed from default/verl
    # ...
)
4

Install tinker Backend

uv pip install -e .[tinker]

Recommendations by Use Case

Research & Experimentation

Recommendation: Start with tinker, scale to verl if needed
  • Begin with tinker for rapid iteration
  • Switch to verl when:
    • Need multi-GPU training
    • Training VLM models
    • Scaling to larger datasets

Production Deployment

Recommendation: Use verl
  • Production-tested infrastructure
  • Scalable to multi-node clusters
  • Better resource management
  • Advanced checkpointing

LoRA Fine-Tuning

Recommendation: tinker or verl (equal)
  • tinker: Simpler configuration
  • verl: Better for distributed LoRA

Vision-Language Tasks

Recommendation: Use verl
  • Full Qwen-VL support
  • Multimodal processors
  • Tested on vision datasets

Summary

Choose verl for:

  • Production deployments
  • Multi-GPU/multi-node training
  • Vision-language models
  • Large-scale experiments

Choose tinker for:

  • Rapid prototyping
  • Single-node training
  • LoRA fine-tuning
  • Workflow development
Both backends are actively maintained and share the same core rLLM framework. Your choice depends on scale and requirements, not quality.

See Also

verl Backend

Detailed verl documentation

tinker Backend

Detailed tinker documentation

Agent Trainer

AgentTrainer API guide

Build docs developers (and LLMs) love