rLLM
Reinforcement Learning for Language Agents
Why rLLM?
rLLM makes it simple to create and train intelligent agents that can use tools, reason through complex problems, and improve their performance through reinforcement learning. The framework has powered state-of-the-art agents including:- rLLM-FinQA-4B: A 4B financial analysis agent that outperforms Qwen3-235B (59.7% vs 51.4%) and rivals Gemini 2.5 Pro on Snorkel Finance Benchmark
- DeepSWE: A 32B software engineering agent achieving 59% on SWEBench-Verified
- DeepCoder-14B: A 14B coding model achieving 60.6% on LiveCodeBench, matching o3-mini performance
- DeepScaleR-1.5B: A 1.5B model surpassing O1-Preview with 43.1% on AIME
Key features
Define custom agents and environments
Build agents with tool usage capabilities and custom environments with reward functions tailored to your domain
Unified inference and training interface
Seamless workflow from agent development to RL training with consistent APIs
Async parallelized trajectory generation
Efficient batch inference with parallel agent-environment execution for faster rollout generation
Scalable RL training
Production-ready training with PPO and GRPO algorithms, supporting FSDP and Megatron backends
Multiple training backends
Choose between verl (GPU-optimized) and tinker (flexible) backends based on your infrastructure
Framework integrations
SDK support for LangGraph, SmolAgents, and Strands for building complex agentic workflows
Advanced training features
LoRA training, VLM support, and multi-agent workflows out of the box
Production ready
Battle-tested in real-world deployments with Docker support and comprehensive documentation
How it works
rLLM provides a complete workflow for building and training language agents:- Define your agent: Create agents with tool usage capabilities, custom prompts, and reasoning strategies
- Build your environment: Implement environments that provide tools, compute rewards, and manage agent interactions
- Generate trajectories: Use the AgentExecutionEngine to run agents in parallel and collect interaction data
- Train with RL: Improve agent performance using PPO or GRPO algorithms with scalable training backends
- Deploy: Export trained models and deploy them for inference in production workloads
Get started
Installation
Install rLLM with pip, uv, or Docker in minutes
Quick start
Build your first math reasoning agent with tools in 10 minutes
Core concepts
Learn about agents, environments, and the execution engine
Examples
Explore code examples for different agent types and domains
Community and support
Discord
Join our community to ask questions and share your projects
GitHub
Contribute to the project, report issues, or browse the source code
Blog
Read about the latest releases, research, and use cases
Twitter/X
Follow us for updates and announcements