Community Projects

The rLLM community has built impressive projects demonstrating the power of reinforcement learning for language agents. From mathematical reasoning to software engineering and research assistants, these projects showcase diverse applications of RL-trained agents.

Featured Projects

DeepScaleR

DeepScaleR: Surpassing O1-Preview with a 1.5B Model

A 1.5B model that surpasses O1-Preview by scaling RL

Achievement: 43.1% Pass@1 on AIME, surpassing O1-Preview Key Innovation: Iteratively scaling Deepseek’s GRPO algorithm from 8K→16K→24K context length for thinking Release Date: February 10, 2025

DeepCoder

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

A 14B coding model matching O3-mini performance

Achievement: 60.6% Pass@1 on LiveCodeBench (+8% improvement) Comparison: Matches o3-mini-2025-01-031 (Low) and o1-2024-12-17 performance Release Date: April 8, 2025

DeepSWE

DeepSWE: Training a State-of-the-Art Coding Agent by Scaling RL

A 32B software engineering agent trained purely with RL

Achievement: 59% on SWEBench-Verified with test-time scaling (42.2% Pass@1) Leaderboard: Tops SWEBench leaderboard for open-weight models Release Date: July 1, 2025

Tongyi DeepResearch

A New Era of Open-Source AI Researchers

Organization: Alibaba NLP Focus: AI-powered research assistant capabilities

Terminal-Bench-RL

Training Long-Horizon Terminal Agents with Reinforcement Learning

Focus: Long-horizon task completion in terminal environments Contribution: Benchmark for evaluating terminal agent performance with RL

PettingLLMs

Using On-Policy Reinforcement Learning for Stronger Multi-Agent Systems

Focus: Multi-agent reinforcement learning Key Innovation: On-policy RL for coordinating multiple language agents

SETA

SETA: Scaling Environments for Terminal Agents

Scaling environments for terminal agent training

Organization: CAMEL-AI Focus: Scalable environment infrastructure for terminal agents

LLM-in-Sandbox

Building General Agents by running LLMs in a sandbox (virtual computer)

Paper: arXiv:2601.16206 Focus: Sandboxed execution environments for general-purpose agents

Research Projects

Cogito, Ergo Ludo

Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning

Game-playing agent using reasoning and planning

Paper: arXiv:2509.25052 Focus: Combining reasoning and planning for game-playing agents

Cut the Bill, Keep the Turns

Cut the Bill, Keep the Turns: Affordable Multi-Turn Search RL

Cost-efficient multi-turn search with RL

Focus: Reducing API costs while maintaining multi-turn search quality

Experiential Reinforcement Learning

Reinforcement Learning with an Experience–Reflection–Consolidation Loop

Paper: arXiv:2602.13949 Focus: Learning through experience, reflection, and consolidation cycles

Official Projects

rLLM-FinQA-4B

A 4B Financial Analysis Agent that Outperforms 235B Models

Achievement: 59.7% vs Qwen3-235B’s 51.4%, rivals Gemini 2.5 Pro (60.6%) Resources:

Project Categories

Reasoning

DeepScaleR (AIME 43.1%)

Coding

DeepCoder (60.6% LiveCodeBench)DeepSWE (59% SWEBench-Verified)

Research

Tongyi DeepResearchExperiential RL

Terminal Agents

Terminal-Bench-RLSETA

Multi-Agent

PettingLLMs

Finance

rLLM-FinQA-4B

Contributing Your Project

Built something awesome with rLLM? We’d love to feature your project!

To add your project to this list:

Open a Pull Request on GitHub
Include the following information:
- Project name and description
- GitHub repository or project page link
- Key achievements or benchmarks
- Any published papers or blog posts
Join our community:
- Discord - Ask questions and share your work
- Twitter/X - Follow for updates
- Website - Learn more about rLLM

Resources

Getting Started

Install rLLM and start building

Tutorials

Learn through step-by-step guides

Discord Community

Join the rLLM community

GitHub

Contribute to rLLM

Citation

If you use rLLM in your research or project, please cite:

@misc{rllm2025,
  title={rLLM: A Framework for Post-Training Language Agents},
  author={Sijun Tan and Michael Luo and Colin Cai and Tarun Venkat and Kyle Montgomery and Aaron Hao and Tianhao Wu and Arnav Balyan and Manan Roongta and Chenguang Wang and Li Erran Li and Raluca Ada Popa and Ion Stoica},
  year={2025},
  howpublished={\url{https://pretty-radio-b75.notion.site/rLLM-A-Framework-for-Post-Training-Language-Agents-21b81902c146819db63cd98a54ba5f31}},
  note={Notion Blog},
}

You may also cite the specific projects: DeepScaleR, DeepCoder, and DeepSWE.

Case Studies

​Featured Projects

​DeepScaleR

DeepScaleR: Surpassing O1-Preview with a 1.5B Model

​DeepCoder

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

​DeepSWE

DeepSWE: Training a State-of-the-Art Coding Agent by Scaling RL

​Tongyi DeepResearch

Tongyi DeepResearch

​Terminal-Bench-RL

Terminal-Bench-RL

​PettingLLMs

PettingLLMs

​SETA

SETA: Scaling Environments for Terminal Agents

​LLM-in-Sandbox

LLM-in-Sandbox

​Research Projects

​Cogito, Ergo Ludo

Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning

​Cut the Bill, Keep the Turns

Cut the Bill, Keep the Turns: Affordable Multi-Turn Search RL

​Experiential Reinforcement Learning

Experiential Reinforcement Learning

​Official Projects

​rLLM-FinQA-4B

rLLM-FinQA-4B

​Project Categories

Reasoning

Coding

Research

Terminal Agents

Multi-Agent

Finance

​Contributing Your Project

​Resources

Getting Started

Tutorials

Discord Community

GitHub

​Citation

Build docs developers (and LLMs) love

Featured Projects

DeepScaleR

DeepCoder

DeepSWE

Tongyi DeepResearch

Terminal-Bench-RL

PettingLLMs

SETA

LLM-in-Sandbox

Research Projects

Cogito, Ergo Ludo

Cut the Bill, Keep the Turns

Experiential Reinforcement Learning

Official Projects

rLLM-FinQA-4B

Project Categories

Contributing Your Project

Resources

Citation