rLLM

Reinforcement Learning for Language Agents

rLLM is an open-source framework for post-training language agents via reinforcement learning. With rLLM, you can easily build custom agents and environments, train them with reinforcement learning, and deploy them for real-world workloads.

Why rLLM?

rLLM makes it simple to create and train intelligent agents that can use tools, reason through complex problems, and improve their performance through reinforcement learning. The framework has powered state-of-the-art agents including:

rLLM-FinQA-4B: A 4B financial analysis agent that outperforms Qwen3-235B (59.7% vs 51.4%) and rivals Gemini 2.5 Pro on Snorkel Finance Benchmark
DeepSWE: A 32B software engineering agent achieving 59% on SWEBench-Verified
DeepCoder-14B: A 14B coding model achieving 60.6% on LiveCodeBench, matching o3-mini performance
DeepScaleR-1.5B: A 1.5B model surpassing O1-Preview with 43.1% on AIME

Key features

Define custom agents and environments

Build agents with tool usage capabilities and custom environments with reward functions tailored to your domain

Unified inference and training interface

Seamless workflow from agent development to RL training with consistent APIs

Async parallelized trajectory generation

Efficient batch inference with parallel agent-environment execution for faster rollout generation

Scalable RL training

Production-ready training with PPO and GRPO algorithms, supporting FSDP and Megatron backends

Multiple training backends

Choose between verl (GPU-optimized) and tinker (flexible) backends based on your infrastructure

Framework integrations

SDK support for LangGraph, SmolAgents, and Strands for building complex agentic workflows

Advanced training features

LoRA training, VLM support, and multi-agent workflows out of the box

Production ready

Battle-tested in real-world deployments with Docker support and comprehensive documentation

How it works

rLLM provides a complete workflow for building and training language agents:

Define your agent: Create agents with tool usage capabilities, custom prompts, and reasoning strategies
Build your environment: Implement environments that provide tools, compute rewards, and manage agent interactions
Generate trajectories: Use the AgentExecutionEngine to run agents in parallel and collect interaction data
Train with RL: Improve agent performance using PPO or GRPO algorithms with scalable training backends
Deploy: Export trained models and deploy them for inference in production workloads

Get started

Installation

Install rLLM with pip, uv, or Docker in minutes

Quick start

Build your first math reasoning agent with tools in 10 minutes

Core concepts

Learn about agents, environments, and the execution engine

Examples

Explore code examples for different agent types and domains

Community and support

Discord

Join our community to ask questions and share your projects

GitHub

Contribute to the project, report issues, or browse the source code

Blog

Read about the latest releases, research, and use cases

Twitter/X

Built by Berkeley Sky Computing Lab

rLLM is developed as part of Berkeley Sky Computing Lab and generously supported by grants from Laude Institute, AWS, Hyperbolic, Fireworks AI, and Modal, with special thanks to Together AI for research partnership and compute support.

Get Started

Core Concepts

SDK

Training Backends

Guides

Introduction to rLLM

rLLM

Why rLLM?

Key features

Define custom agents and environments

Unified inference and training interface

Async parallelized trajectory generation

Scalable RL training

Multiple training backends

Framework integrations

Advanced training features

Production ready

How it works

Get started

Installation

Quick start

Core concepts

Examples

Community and support

Discord

GitHub

Blog

Twitter/X

Built by Berkeley Sky Computing Lab

Build docs developers (and LLMs) love

Get Started

Core Concepts

SDK

Training Backends

Guides

​rLLM

​Why rLLM?

​Key features

Define custom agents and environments

Unified inference and training interface

Async parallelized trajectory generation

Scalable RL training

Multiple training backends

Framework integrations

Advanced training features

Production ready

​How it works

​Get started

Installation

Quick start

Core concepts

Examples

​Community and support

Discord

GitHub

Blog

Twitter/X

​Built by Berkeley Sky Computing Lab

Build docs developers (and LLMs) love

rLLM

Why rLLM?

Key features

How it works

Get started

Community and support

Built by Berkeley Sky Computing Lab