Overview
OpenSandbox provides an ideal environment for RL training:- Isolated Execution: Each agent trains in a clean, isolated environment
- Reproducible Results: Consistent environment across training runs
- Scalable: Run hundreds of parallel training jobs using BatchSandbox
- Safe: Contained execution prevents system interference
- Portable: Train locally or in Kubernetes clusters
Prerequisites
Basic RL Training Example
Training Script
Create the training script that will run inside the sandbox:train.py
Requirements File
requirements.txt
Python Client
main.py
Run the Example
Advanced: Batch RL Training
Scale up to hundreds of parallel training runs using BatchSandbox:Step 1: Deploy Kubernetes Controller
See Kubernetes Deployment for full setup.Step 2: Create RL Training Pool
rl-pool.yaml
Step 3: Launch Batch Training
rl-batch.yaml
Heterogeneous Training
Train different agents or hyperparameters across sandboxes:heterogeneous-rl.yaml
TensorBoard Integration
Visualize training metrics with TensorBoard:tensorboard_example.py
Checkpoint Management
Save and retrieve trained models:checkpoint_management.py
Environment Variables
| Variable | Description | Default |
|---|---|---|
SANDBOX_DOMAIN | Sandbox service address | localhost:8080 |
SANDBOX_API_KEY | API key for authentication | None |
SANDBOX_IMAGE | Docker image to use | opensandbox/code-interpreter:v1.0.1 |
RL_TIMESTEPS | Training timesteps | 5000 |
RL_TENSORBOARD_LOG | TensorBoard log directory | runs |
LEARNING_RATE | Learning rate | 1e-3 |
Performance Tips
Optimize Training Speed
Optimize Training Speed
- Use pooled sandboxes for faster startup
- Pre-install dependencies in custom images
- Increase
train_freqandgradient_stepsfor faster learning - Use GPU-enabled sandbox images for deep RL
Scale Parallel Training
Scale Parallel Training
- Use BatchSandbox for 100+ parallel agents
- Set appropriate pool buffer sizes
- Monitor cluster resources and autoscale
- Use heterogeneous tasks for hyperparameter search
Manage Checkpoints
Manage Checkpoints
- Save checkpoints periodically during training
- Use sandbox file system for intermediate results
- Download final checkpoints to persistent storage
- Implement checkpoint rotation for long training runs
Monitor Training
Monitor Training
- Use TensorBoard for real-time metrics
- Log training summaries to JSON files
- Track reward curves and loss values
- Set up alerts for failed training runs
Common Patterns
Population-Based Training
pbt_example.py
Distributed PPO
distributed_ppo.py
Troubleshooting
Dependency Installation Fails
Dependency Installation Fails
Problem: pip install fails inside sandboxSolution:
- Use
--break-system-packagesflag - Try alternative installation methods (apt, apk)
- Pre-build custom image with dependencies
Training Runs Out of Memory
Training Runs Out of Memory
Problem: Sandbox crashes during trainingSolution:
- Increase memory limits in pool spec
- Reduce buffer size or batch size
- Use smaller models or environments
- Monitor memory usage during training
Checkpoints Not Saved
Checkpoints Not Saved
Problem: Cannot find checkpoint filesSolution:
- Verify checkpoint directory exists
- Check file permissions in sandbox
- Use absolute paths for checkpoint saving
- Read files before sandbox termination
Next Steps
Batch Sandboxes
Learn batch sandbox patterns
Kubernetes Deployment
Deploy on Kubernetes
Python SDK
Python SDK reference
API Reference
Complete API documentation