Skip to main content
Hosted Training, available within the Prime Intellect Lab platform, enables you to automatically train models via prime-rl without needing to manage your own infrastructure. Hosted Training supports LoRA for RL training and can be used with any environment built with Verifiers.

Features

  • Zero infrastructure management - No need to provision GPUs or manage servers
  • Automatic scaling - Training infrastructure scales based on your config
  • LoRA training - Efficient parameter-efficient fine-tuning
  • Any Verifiers environment - Train on Hub environments or your own
  • Weights & Biases integration - Automatic logging and experiment tracking

Getting Started

Hosted Training is currently in Private Beta. For access, please fill out this form.
1

Set up your workspace

Download example configuration files:
prime lab setup
This creates:
configs/
├── endpoints.toml      # API endpoint configuration
├── rl/                 # Hosted Training configs
│   ├── alphabet-sort.toml
│   ├── gsm8k.toml
│   ├── math-python.toml
│   ├── reverse-text.toml
│   ├── wiki-search.toml
│   └── wordle.toml
├── eval/               # Evaluation configs
└── gepa/               # Prompt optimization configs
2

Configure your training

Edit one of the example configs or create your own. Example for alphabet-sort:
model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
max_steps = 500
batch_size = 256
rollouts_per_example = 8

[sampling]
max_tokens = 512

[[env]]
id = "primeintellect/alphabet-sort"
args = { min_turns = 3, max_turns = 5, power_per_turn = false }

[wandb]
project = "alphabet-sort"
name = "qwen3-30b-i-alphabet-sort"
3

Submit your training job

Submit to Hosted Training via the Prime CLI:
prime train submit configs/rl/alphabet-sort.toml
Or use the web interface at app.primeintellect.ai/dashboard/training
4

Monitor your training

View training progress:
  • In the Prime Intellect dashboard
  • In Weights & Biases (if configured)
  • Via the CLI: prime train status <job-id>

Supported Models

We currently support the following models for Hosted Training:
  • Qwen/Qwen3-4B-Instruct-2507
  • Qwen/Qwen3-4B-Thinking-2507
  • Qwen/Qwen3-30B-Instruct-2507
  • Qwen/Qwen3-30B-Thinking-2507
  • Qwen/Qwen3-235B-Instruct-2507
  • Qwen/Qwen3-235B-Thinking-2507
  • PrimeIntellect/INTELLECT-3
Additional models can be supported upon request. Contact support if you need a specific model.

Configuration Reference

Basic Configuration

model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 500
batch_size = 256
rollouts_per_example = 8
learning_rate = 1e-5

Environment Configuration

Train on environments from the Environments Hub:
[[env]]
id = "primeintellect/math-python"
args = { max_turns = 10, difficulty = "hard" }
Or train on your own local environment:
[[env]]
id = "my-custom-env"  # from ./environments/my_custom_env
args = { num_examples = 1000 }
Multiple environments:
[[env]]
id = "primeintellect/math-python"
weight = 0.6

[[env]]
id = "primeintellect/gsm8k"
weight = 0.4

Sampling Configuration

[sampling]
max_tokens = 512
temperature = 0.7
top_p = 0.9
stop = ["<|endoftext|>", "</s>"]

LoRA Configuration

LoRA is enabled by default for Hosted Training:
[lora]
enabled = true
r = 64
alpha = 16
dropout = 0.05
target_modules = ["q_proj", "v_proj", "k_proj", "o_proj"]

Weights & Biases Integration

[wandb]
project = "my-project"
name = "my-training-run"
entity = "my-team"  # optional
Set your W&B API key in the Prime Intellect dashboard under Settings > Environment Variables.

Environment Variables

If your environment requires API keys or secrets, configure them via:
  1. Dashboard: Settings > Environment Variables
  2. Config file:
env_file = ["secrets.env"]
Then create secrets.env:
OPENAI_API_KEY=sk-...
BROWSERBASE_API_KEY=...
Environment variables set in the dashboard take precedence over those in env_file.

Training Examples

GSM8K Math Training

model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 1000
batch_size = 256
rollouts_per_example = 8

[sampling]
max_tokens = 1024
temperature = 0.7

[[env]]
id = "primeintellect/gsm8k"
args = { max_turns = 1 }

[wandb]
project = "gsm8k-training"
model = "Qwen/Qwen3-30B-Instruct-2507"
max_steps = 500
batch_size = 128
rollouts_per_example = 8

[sampling]
max_tokens = 2048

[[env]]
id = "primeintellect/wiki-search"
args = { max_turns = 10, num_questions = 5 }

[wandb]
project = "wiki-search"
name = "qwen3-30b-wiki"

Multi-Environment Training

model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 2000
batch_size = 256

[[env]]
id = "primeintellect/math-python"
weight = 0.4

[[env]]
id = "primeintellect/gsm8k"
weight = 0.3

[[env]]
id = "primeintellect/wiki-search"
weight = 0.3

[wandb]
project = "multi-task-training"

Downloading Checkpoints

After training completes, download your trained model:
prime train download <job-id> --output ./checkpoints/my-model
This downloads the final checkpoint and LoRA adapter (if applicable).

Best Practices

Before submitting a training job, validate your environment locally:
prime eval run my-env -m openai/gpt-4.1-mini -n 10
Ensure baseline reward is between 5% and 80%.

Hyperparameter Guidelines

For faster training:
  • Use smaller models (4B-30B)
  • Increase learning rate (1e-5 to 1e-4)
  • Decrease rollouts_per_example (4-8)
For more stable training:
  • Use larger models (30B+)
  • Increase rollouts_per_example (16-32)
  • Increase batch_size (512+)

Cost Optimization

  • Use LoRA instead of full finetuning
  • Start with smaller models and scale up if needed
  • Use max_steps to limit training duration
  • Monitor W&B to stop training when performance plateaus

Troubleshooting

Training Not Starting

  • Check that your config is valid TOML
  • Ensure your environment is published to the Environments Hub (if using a Hub environment)
  • Verify all required API keys are set

Training Failed

  • Check job logs: prime train logs <job-id>
  • Common issues:
    • Missing environment dependencies
    • Invalid environment arguments
    • Missing API keys for environment

Poor Training Performance

  • Task may be too hard for the model (baseline reward < 5%)
  • Task may be too easy (baseline reward > 80%)
  • Learning rate may be too high (causing instability)
  • Try enabling online difficulty filtering in advanced settings

Support

For help with Hosted Training:

Build docs developers (and LLMs) love