Skip to main content
This project demonstrates training and deploying rLLM-FinQA-4B, a specialized financial question-answering agent fine-tuned from Qwen3-4B-Instruct-2507 using rLLM. The agent uses specialized tools (SQL queries, table lookup, calculators) to perform multi-step reasoning over SEC 10-K financial statements.

Model

rLLM-FinQA-4B on HuggingFace

Dataset

5,110 Q&A pairs across 207 companies

Blog

Read the full announcement

Performance

rLLM-FinQA-4B achieves 59.7% accuracy on Snorkel Finance Benchmark, demonstrating that small models trained with RL can outperform much larger models:
ModelParametersAccuracy
rLLM-FinQA-4B4B59.7%
Gemini 2.5 ProUnknown60.6%
Qwen3-235B235B51.4%
The 4B agent outperforms Qwen3-235B by 8.3 percentage points and rivals Gemini 2.5 Pro on Snorkel AI’s expert-curated agentic financial benchmark.

Overview

The FinQA project demonstrates:
  • How to use rLLM’s ToolAgent and ToolEnvironment for multi-step financial reasoning
  • How to build domain-specific tools in rLLM
  • How to train agents with GRPO using LLM-as-judge rewards
  • How to achieve state-of-the-art performance with small models using RL

Agent Architecture

The FinQA agent is a ReAct-style tool agent that answers financial questions by querying structured tables extracted from SEC 10-K filings. The agent has access to 4 specialized tools:
ToolDescription
get_table_namesList available tables for a given company
get_table_infoGet table metadata, columns, dtypes, and sample values
sql_queryExecute SQL queries on in-memory SQLite tables
calculatorEvaluate mathematical expressions
All table data is preloaded into in-memory SQLite for low latency runtime access.

Quick Start

Installation

Follow the installation guide, then install FinQA dependencies:
uv pip install -r projects/finqa/requirements.txt

Dataset Preparation

Download the rLLM/finqa dataset and prepare it for training and evaluation:
python -m projects.finqa.prepare_finqa_data
This will:
  • Download the dataset from HuggingFace (5,110 Q&A pairs)
  • Extract company tables to projects/finqa/data/company_tables/ (207 companies, 6,923 tables)
  • Create train/val/test splits (4,030 / 522 / 558 examples)
  • Register all splits with the rLLM DatasetRegistry

Inference

Start a vLLM server and run the agent:
python -m vllm.entrypoints.openai.api_server \
    --model rLLM/rLLM-FinQA-4B \
    --host 0.0.0.0 \
    --port 30000 \
    --dtype bfloat16

python -m projects.finqa.run_finqa

Training

Set the required environment variables before training:
VariableDescription
OPENAI_API_KEYOpenAI API key for the reward judge
PORTKEY_API_KEYPortkey gateway key for reward judge caching

Training with verl Backend

Train the 4B model with the verl backend:
bash projects/finqa/train_finqa.sh

Training with tinker Backend

Train with LoRA on the 30B model using the tinker backend:
bash projects/finqa/train_finqa_tinker.sh
The training uses GPT-5-nano as a reward judge with Portkey gateway for caching to reduce API costs.

Implementation Details

Base Model

Dataset

  • Source: rLLM/finqa on HuggingFace
  • Size: 5,110 Q&A pairs across 207 companies
  • Tables: 6,923 tables extracted from SEC 10-K filings
  • Splits: 4,030 train / 522 validation / 558 test examples

Training Configuration

  • Algorithm: GRPO (Group Relative Policy Optimization)
  • Reward: LLM-as-judge using GPT-5-nano
  • Caching: Portkey gateway for reward caching
  • Backend: verl (default) or tinker (for LoRA)

Code Reference

Financial Agent Runner

Main script for running financial reasoning:
projects/finqa/run_finqa.py
--8<-- "projects/finqa/run_finqa.py"

Training Script

FinQA training configuration:
projects/finqa/train_finqa.py
--8<-- "projects/finqa/train_finqa.py"

Resources

Model on HuggingFace

Download rLLM-FinQA-4B weights

Dataset on HuggingFace

Access the FinQA dataset

Blog Post

Read the announcement blog

GitHub Project

View complete source code

Next Steps

Tool Agents

Learn more about building tool agents

Training

Explore training configurations

Community Projects

See more projects built with rLLM

API Reference

Browse the API documentation

Build docs developers (and LLMs) love