Model
rLLM-FinQA-4B on HuggingFace
Dataset
5,110 Q&A pairs across 207 companies
Blog
Read the full announcement
Performance
rLLM-FinQA-4B achieves 59.7% accuracy on Snorkel Finance Benchmark, demonstrating that small models trained with RL can outperform much larger models:| Model | Parameters | Accuracy |
|---|---|---|
| rLLM-FinQA-4B | 4B | 59.7% |
| Gemini 2.5 Pro | Unknown | 60.6% |
| Qwen3-235B | 235B | 51.4% |
The 4B agent outperforms Qwen3-235B by 8.3 percentage points and rivals Gemini 2.5 Pro on Snorkel AI’s expert-curated agentic financial benchmark.
Overview
The FinQA project demonstrates:- How to use rLLM’s
ToolAgentandToolEnvironmentfor multi-step financial reasoning - How to build domain-specific tools in rLLM
- How to train agents with GRPO using LLM-as-judge rewards
- How to achieve state-of-the-art performance with small models using RL
Agent Architecture
The FinQA agent is a ReAct-style tool agent that answers financial questions by querying structured tables extracted from SEC 10-K filings. The agent has access to 4 specialized tools:| Tool | Description |
|---|---|
get_table_names | List available tables for a given company |
get_table_info | Get table metadata, columns, dtypes, and sample values |
sql_query | Execute SQL queries on in-memory SQLite tables |
calculator | Evaluate mathematical expressions |
Quick Start
Installation
Follow the installation guide, then install FinQA dependencies:Dataset Preparation
Download the rLLM/finqa dataset and prepare it for training and evaluation:- Download the dataset from HuggingFace (5,110 Q&A pairs)
- Extract company tables to
projects/finqa/data/company_tables/(207 companies, 6,923 tables) - Create train/val/test splits (4,030 / 522 / 558 examples)
- Register all splits with the rLLM DatasetRegistry
Inference
Start a vLLM server and run the agent:Training
Set the required environment variables before training:| Variable | Description |
|---|---|
OPENAI_API_KEY | OpenAI API key for the reward judge |
PORTKEY_API_KEY | Portkey gateway key for reward judge caching |
Training with verl Backend
Train the 4B model with the verl backend:Training with tinker Backend
Train with LoRA on the 30B model using the tinker backend:Implementation Details
Base Model
- Qwen3-4B-Instruct-2507
- Alternative: Qwen3-30B-A3B-Instruct-2507 with LoRA
Dataset
- Source: rLLM/finqa on HuggingFace
- Size: 5,110 Q&A pairs across 207 companies
- Tables: 6,923 tables extracted from SEC 10-K filings
- Splits: 4,030 train / 522 validation / 558 test examples
Training Configuration
- Algorithm: GRPO (Group Relative Policy Optimization)
- Reward: LLM-as-judge using GPT-5-nano
- Caching: Portkey gateway for reward caching
- Backend: verl (default) or tinker (for LoRA)
Code Reference
Financial Agent Runner
Main script for running financial reasoning:projects/finqa/run_finqa.py
Training Script
FinQA training configuration:projects/finqa/train_finqa.py
Resources
Model on HuggingFace
Download rLLM-FinQA-4B weights
Dataset on HuggingFace
Access the FinQA dataset
Blog Post
Read the announcement blog
GitHub Project
View complete source code
Next Steps
Tool Agents
Learn more about building tool agents
Training
Explore training configurations
Community Projects
See more projects built with rLLM
API Reference
Browse the API documentation