Hosted Training

Hosted Training, available within the Prime Intellect Lab platform, enables you to automatically train models via prime-rl without needing to manage your own infrastructure. Hosted Training supports LoRA for RL training and can be used with any environment built with Verifiers.

Features

Zero infrastructure management - No need to provision GPUs or manage servers
Automatic scaling - Training infrastructure scales based on your config
LoRA training - Efficient parameter-efficient fine-tuning
Any Verifiers environment - Train on Hub environments or your own
Weights & Biases integration - Automatic logging and experiment tracking

Getting Started

Hosted Training is currently in Private Beta. For access, please fill out this form.

Set up your workspace

Download example configuration files:

prime lab setup

This creates:

configs/
├── endpoints.toml      # API endpoint configuration
├── rl/                 # Hosted Training configs
│   ├── alphabet-sort.toml
│   ├── gsm8k.toml
│   ├── math-python.toml
│   ├── reverse-text.toml
│   ├── wiki-search.toml
│   └── wordle.toml
├── eval/               # Evaluation configs
└── gepa/               # Prompt optimization configs

Configure your training

Edit one of the example configs or create your own. Example for alphabet-sort:

model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
max_steps = 500
batch_size = 256
rollouts_per_example = 8

[sampling]
max_tokens = 512

[[env]]
id = "primeintellect/alphabet-sort"
args = { min_turns = 3, max_turns = 5, power_per_turn = false }

[wandb]
project = "alphabet-sort"
name = "qwen3-30b-i-alphabet-sort"

Submit your training job

Submit to Hosted Training via the Prime CLI:

prime train submit configs/rl/alphabet-sort.toml

Or use the web interface at app.primeintellect.ai/dashboard/training

Monitor your training

View training progress:

In the Prime Intellect dashboard
In Weights & Biases (if configured)
Via the CLI: prime train status <job-id>

Supported Models

We currently support the following models for Hosted Training:

Qwen/Qwen3-4B-Instruct-2507
Qwen/Qwen3-4B-Thinking-2507
Qwen/Qwen3-30B-Instruct-2507
Qwen/Qwen3-30B-Thinking-2507
Qwen/Qwen3-235B-Instruct-2507
Qwen/Qwen3-235B-Thinking-2507
PrimeIntellect/INTELLECT-3

Additional models can be supported upon request. Contact support if you need a specific model.

Configuration Reference

Basic Configuration

model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 500
batch_size = 256
rollouts_per_example = 8
learning_rate = 1e-5

Environment Configuration

Train on environments from the Environments Hub:

[[env]]
id = "primeintellect/math-python"
args = { max_turns = 10, difficulty = "hard" }

Or train on your own local environment:

[[env]]
id = "my-custom-env"  # from ./environments/my_custom_env
args = { num_examples = 1000 }

Multiple environments:

[[env]]
id = "primeintellect/math-python"
weight = 0.6

[[env]]
id = "primeintellect/gsm8k"
weight = 0.4

Sampling Configuration

[sampling]
max_tokens = 512
temperature = 0.7
top_p = 0.9
stop = ["<|endoftext|>", "</s>"]

LoRA Configuration

LoRA is enabled by default for Hosted Training:

[lora]
enabled = true
r = 64
alpha = 16
dropout = 0.05
target_modules = ["q_proj", "v_proj", "k_proj", "o_proj"]

Weights & Biases Integration

[wandb]
project = "my-project"
name = "my-training-run"
entity = "my-team"  # optional

Set your W&B API key in the Prime Intellect dashboard under Settings > Environment Variables.

Environment Variables

If your environment requires API keys or secrets, configure them via:

Dashboard: Settings > Environment Variables
Config file:

env_file = ["secrets.env"]

Then create secrets.env:

OPENAI_API_KEY=sk-...
BROWSERBASE_API_KEY=...

Environment variables set in the dashboard take precedence over those in env_file.

Training Examples

GSM8K Math Training

model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 1000
batch_size = 256
rollouts_per_example = 8

[sampling]
max_tokens = 1024
temperature = 0.7

[[env]]
id = "primeintellect/gsm8k"
args = { max_turns = 1 }

[wandb]
project = "gsm8k-training"

Multi-Turn Wiki Search

model = "Qwen/Qwen3-30B-Instruct-2507"
max_steps = 500
batch_size = 128
rollouts_per_example = 8

[sampling]
max_tokens = 2048

[[env]]
id = "primeintellect/wiki-search"
args = { max_turns = 10, num_questions = 5 }

[wandb]
project = "wiki-search"
name = "qwen3-30b-wiki"

Multi-Environment Training

model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 2000
batch_size = 256

[[env]]
id = "primeintellect/math-python"
weight = 0.4

[[env]]
id = "primeintellect/gsm8k"
weight = 0.3

[[env]]
id = "primeintellect/wiki-search"
weight = 0.3

[wandb]
project = "multi-task-training"

Downloading Checkpoints

After training completes, download your trained model:

prime train download <job-id> --output ./checkpoints/my-model

This downloads the final checkpoint and LoRA adapter (if applicable).

Best Practices

Before submitting a training job, validate your environment locally:

prime eval run my-env -m openai/gpt-4.1-mini -n 10

Ensure baseline reward is between 5% and 80%.

Hyperparameter Guidelines

For faster training:

Use smaller models (4B-30B)
Increase learning rate (1e-5 to 1e-4)
Decrease rollouts_per_example (4-8)

For more stable training:

Use larger models (30B+)
Increase rollouts_per_example (16-32)
Increase batch_size (512+)

Cost Optimization

Use LoRA instead of full finetuning
Start with smaller models and scale up if needed
Use max_steps to limit training duration
Monitor W&B to stop training when performance plateaus

Troubleshooting

Training Not Starting

Check that your config is valid TOML
Ensure your environment is published to the Environments Hub (if using a Hub environment)
Verify all required API keys are set

Training Failed

Check job logs: prime train logs <job-id>
Common issues:
- Missing environment dependencies
- Invalid environment arguments
- Missing API keys for environment

Poor Training Performance

Task may be too hard for the model (baseline reward < 5%)
Task may be too easy (baseline reward > 80%)
Learning rate may be too high (causing instability)
Try enabling online difficulty filtering in advanced settings

Support

For help with Hosted Training:

Get Started

Core Concepts

Guides

Integrations

Features

Getting Started

Supported Models

Configuration Reference

Basic Configuration

Environment Configuration

Sampling Configuration

LoRA Configuration

Weights & Biases Integration

Environment Variables

Training Examples

GSM8K Math Training

Multi-Turn Wiki Search

Multi-Environment Training

Downloading Checkpoints

Best Practices

Hyperparameter Guidelines

Cost Optimization

Troubleshooting

Training Not Starting

Training Failed

Poor Training Performance

Support

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Integrations

​Features

​Getting Started

​Supported Models

​Configuration Reference

​Basic Configuration

​Environment Configuration

​Sampling Configuration

​LoRA Configuration

​Weights & Biases Integration

​Environment Variables

​Training Examples

​GSM8K Math Training

​Multi-Turn Wiki Search

​Multi-Environment Training

​Downloading Checkpoints

​Best Practices

​Hyperparameter Guidelines

​Cost Optimization

​Troubleshooting

​Training Not Starting

​Training Failed

​Poor Training Performance

​Support

Build docs developers (and LLMs) love

Features

Getting Started

Supported Models

Configuration Reference

Basic Configuration

Environment Configuration

Sampling Configuration

LoRA Configuration

Weights & Biases Integration

Environment Variables

Training Examples

GSM8K Math Training

Multi-Turn Wiki Search

Multi-Environment Training

Downloading Checkpoints

Best Practices

Hyperparameter Guidelines

Cost Optimization

Troubleshooting

Training Not Starting

Training Failed

Poor Training Performance

Support