What It Evaluates
Genesis tests reward function engineering ability. The agent’s output (a reward function) is not judged directly; instead, it is used to train an RL policy viarsl-rl, and the resulting policy’s behavior is evaluated in simulation. The primary metric is average_fitness — a normalized 0–1 score measuring how well the trained policy executes the task.
Evaluation Setup
- Total simulation time: 20 seconds per evaluation
- Episode duration: 4.0 seconds (200 steps at dt = 0.02 s)
- Parallel environments: 4096 simulated simultaneously
- Fitness score range: 0 (worst) to 1 (best)
- RL training: 101 policy update iterations before evaluation
- Early termination: an episode ends early if the robot falls (roll or pitch > 10°)
The Three Environments
- go2walking
- go2walkback
- go2hop
Go2WalkingCommand-v0 — The Unitree Go2 robot must learn to walk forward at a commanded speed.
- Task:
Go2WalkingCommand-v0/speed - Linear velocity range:
[0.2, 0.8]m/s in the x direction - Default episodes: 6
- Domain:
genesis_go2walking
Requirements
Install PyTorch
Check your CUDA version first:Install Genesis
Setup and Run
num_workers Constraint
Hydra Configuration
The config file is atdomains/genesis/config/config.yaml. Key sections:
Output Structure
Outputs are written to<output_dir>/<env_name>/<task_name>/ and include:
chat_history_*.md— agent conversation logrl_eval_<episode_idx>/— evaluation results (JSON log,eval_100.mp4video)rl_train_<episode_idx>/— training artifacts (model checkpointsmodel_0.pt,model_100.pt, TensorBoard events, config pickle)
Domain Properties
| Property | Value |
|---|---|
| Score key | average_fitness |
| Splits | train only |
| Eval subset | full dataset |
| Ensemble supported | No |
| Staged eval samples | 3 out of 6 (50%) |
num_workers | Always 1 |