prime-rl without needing to manage your own infrastructure.
Hosted Training supports LoRA for RL training and can be used with any environment built with Verifiers.
Features
- Zero infrastructure management - No need to provision GPUs or manage servers
- Automatic scaling - Training infrastructure scales based on your config
- LoRA training - Efficient parameter-efficient fine-tuning
- Any Verifiers environment - Train on Hub environments or your own
- Weights & Biases integration - Automatic logging and experiment tracking
Getting Started
Hosted Training is currently in Private Beta. For access, please fill out this form.
Configure your training
Edit one of the example configs or create your own. Example for
alphabet-sort:Submit your training job
Submit to Hosted Training via the Prime CLI:Or use the web interface at app.primeintellect.ai/dashboard/training
Supported Models
We currently support the following models for Hosted Training:Qwen/Qwen3-4B-Instruct-2507Qwen/Qwen3-4B-Thinking-2507Qwen/Qwen3-30B-Instruct-2507Qwen/Qwen3-30B-Thinking-2507Qwen/Qwen3-235B-Instruct-2507Qwen/Qwen3-235B-Thinking-2507PrimeIntellect/INTELLECT-3
Additional models can be supported upon request. Contact support if you need a specific model.
Configuration Reference
Basic Configuration
Environment Configuration
Train on environments from the Environments Hub:Sampling Configuration
LoRA Configuration
LoRA is enabled by default for Hosted Training:Weights & Biases Integration
Environment Variables
If your environment requires API keys or secrets, configure them via:- Dashboard: Settings > Environment Variables
- Config file:
secrets.env:
Environment variables set in the dashboard take precedence over those in
env_file.Training Examples
GSM8K Math Training
Multi-Turn Wiki Search
Multi-Environment Training
Downloading Checkpoints
After training completes, download your trained model:Best Practices
Before submitting a training job, validate your environment locally:Ensure baseline reward is between 5% and 80%.
Hyperparameter Guidelines
For faster training:- Use smaller models (4B-30B)
- Increase learning rate (1e-5 to 1e-4)
- Decrease
rollouts_per_example(4-8)
- Use larger models (30B+)
- Increase
rollouts_per_example(16-32) - Increase
batch_size(512+)
Cost Optimization
- Use LoRA instead of full finetuning
- Start with smaller models and scale up if needed
- Use
max_stepsto limit training duration - Monitor W&B to stop training when performance plateaus
Troubleshooting
Training Not Starting
- Check that your config is valid TOML
- Ensure your environment is published to the Environments Hub (if using a Hub environment)
- Verify all required API keys are set
Training Failed
- Check job logs:
prime train logs <job-id> - Common issues:
- Missing environment dependencies
- Invalid environment arguments
- Missing API keys for environment
Poor Training Performance
- Task may be too hard for the model (baseline reward < 5%)
- Task may be too easy (baseline reward > 80%)
- Learning rate may be too high (causing instability)
- Try enabling online difficulty filtering in advanced settings
Support
For help with Hosted Training:- Email: [email protected]
- Discord: discord.gg/primeintellect
- Documentation: docs.primeintellect.ai