Skip to main content
The rLLM community has built impressive projects demonstrating the power of reinforcement learning for language agents. From mathematical reasoning to software engineering and research assistants, these projects showcase diverse applications of RL-trained agents.

DeepScaleR

DeepScaleR: Surpassing O1-Preview with a 1.5B Model

A 1.5B model that surpasses O1-Preview by scaling RL
Achievement: 43.1% Pass@1 on AIME, surpassing O1-Preview Key Innovation: Iteratively scaling Deepseek’s GRPO algorithm from 8K→16K→24K context length for thinking Release Date: February 10, 2025

DeepCoder

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

A 14B coding model matching O3-mini performance
Achievement: 60.6% Pass@1 on LiveCodeBench (+8% improvement) Comparison: Matches o3-mini-2025-01-031 (Low) and o1-2024-12-17 performance Release Date: April 8, 2025

DeepSWE

DeepSWE: Training a State-of-the-Art Coding Agent by Scaling RL

A 32B software engineering agent trained purely with RL
Achievement: 59% on SWEBench-Verified with test-time scaling (42.2% Pass@1) Leaderboard: Tops SWEBench leaderboard for open-weight models Release Date: July 1, 2025

Tongyi DeepResearch

Tongyi DeepResearch

A New Era of Open-Source AI Researchers
GitHub Repo stars Organization: Alibaba NLP Focus: AI-powered research assistant capabilities

Terminal-Bench-RL

Terminal-Bench-RL

Training Long-Horizon Terminal Agents with Reinforcement Learning
GitHub Repo stars Focus: Long-horizon task completion in terminal environments Contribution: Benchmark for evaluating terminal agent performance with RL

PettingLLMs

PettingLLMs

Using On-Policy Reinforcement Learning for Stronger Multi-Agent Systems
GitHub Repo stars Focus: Multi-agent reinforcement learning Key Innovation: On-policy RL for coordinating multiple language agents

SETA

SETA: Scaling Environments for Terminal Agents

Scaling environments for terminal agent training
GitHub Repo stars Organization: CAMEL-AI Focus: Scalable environment infrastructure for terminal agents

LLM-in-Sandbox

LLM-in-Sandbox

Building General Agents by running LLMs in a sandbox (virtual computer)
GitHub Repo stars Paper: arXiv:2601.16206 Focus: Sandboxed execution environments for general-purpose agents

Research Projects

Cogito, Ergo Ludo

Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning

Game-playing agent using reasoning and planning
Paper: arXiv:2509.25052 Focus: Combining reasoning and planning for game-playing agents

Cut the Bill, Keep the Turns

Cut the Bill, Keep the Turns: Affordable Multi-Turn Search RL

Cost-efficient multi-turn search with RL
Focus: Reducing API costs while maintaining multi-turn search quality

Experiential Reinforcement Learning

Experiential Reinforcement Learning

Reinforcement Learning with an Experience–Reflection–Consolidation Loop
Paper: arXiv:2602.13949 Focus: Learning through experience, reflection, and consolidation cycles

Official Projects

rLLM-FinQA-4B

rLLM-FinQA-4B

A 4B Financial Analysis Agent that Outperforms 235B Models
Achievement: 59.7% vs Qwen3-235B’s 51.4%, rivals Gemini 2.5 Pro (60.6%) Resources:

Project Categories

Reasoning

DeepScaleR (AIME 43.1%)

Coding

DeepCoder (60.6% LiveCodeBench)DeepSWE (59% SWEBench-Verified)

Research

Tongyi DeepResearchExperiential RL

Terminal Agents

Terminal-Bench-RLSETA

Multi-Agent

PettingLLMs

Finance

rLLM-FinQA-4B

Contributing Your Project

Built something awesome with rLLM? We’d love to feature your project!
To add your project to this list:
  1. Open a Pull Request on GitHub
  2. Include the following information:
    • Project name and description
    • GitHub repository or project page link
    • Key achievements or benchmarks
    • Any published papers or blog posts
  3. Join our community:

Resources

Getting Started

Install rLLM and start building

Tutorials

Learn through step-by-step guides

Discord Community

Join the rLLM community

GitHub

Contribute to rLLM

Citation

If you use rLLM in your research or project, please cite:
@misc{rllm2025,
  title={rLLM: A Framework for Post-Training Language Agents},
  author={Sijun Tan and Michael Luo and Colin Cai and Tarun Venkat and Kyle Montgomery and Aaron Hao and Tianhao Wu and Arnav Balyan and Manan Roongta and Chenguang Wang and Li Erran Li and Raluca Ada Popa and Ion Stoica},
  year={2025},
  howpublished={\url{https://pretty-radio-b75.notion.site/rLLM-A-Framework-for-Post-Training-Language-Agents-21b81902c146819db63cd98a54ba5f31}},
  note={Notion Blog},
}
You may also cite the specific projects: DeepScaleR, DeepCoder, and DeepSWE.

Build docs developers (and LLMs) love