What qualifies as a new record
A submission qualifies as a new SOTA record if and only if it satisfies all of the following criteria:1. Beat existing SOTA by at least 0.005 nats
1. Beat existing SOTA by at least 0.005 nats
Your submission must achieve a
val_bpb that corresponds to at least a 0.005-nat improvement over the current best record. Small improvements below this threshold are not eligible for leaderboard placement, regardless of how many runs support them.2. Statistical evidence: p < 0.01
2. Statistical evidence: p < 0.01
Because of inter-run variance, all submissions must provide enough run logs to show at
p < 0.01 that the 0.005-nat improvement was achieved. Include multiple run logs with your PR to support this claim.Exception: For submissions that improve speed through systems optimization without changing the ML — for example, kernel rewrites, communication optimizations, or compiler improvements — the statistical requirement is waived.3. Tokenizer and dataset changes must be verified
3. Tokenizer and dataset changes must be verified
If your submission changes the tokenizer or dataset, you must prove with certainty that
val_bpb is correctly calculated. Submissions that edit the tokenizer will be examined much more carefully, since bugs in tokenizer handling can unjustly improve your score.4. Reproducible in under 10 minutes on 8xH100
4. Reproducible in under 10 minutes on 8xH100
The run must reproducibly complete in under 10 minutes on 8xH100 SXM GPUs. This is the 10+10 rule: training must finish in under 10 minutes, and evaluation must also finish in under 10 minutes, both on the same 8xH100 hardware.
The 10+10 rule
The compute budget for leaderboard submissions is strict:- Training: under 10 minutes on 8xH100 SXM
- Evaluation: under 10 minutes on 8xH100 SXM (in addition to training time)
Evaluation at any sequence length is permitted, as in modded-nanogpt. You are encouraged to push the bounds of evaluation methods as aggressively as training methods — as long as the total evaluation wall-clock stays within the limit.
PR checklist
Before opening a pull request for a record submission, confirm each of the following:Beat SOTA by 0.005 nats
Verify your
val_bpb improvement over the current leaderboard leader corresponds to at least 0.005 nats. Check the leaderboard table in the repository README for the current best score.Provide statistical evidence (p < 0.01)
Include enough run logs — typically multiple independent runs — to show at
p < 0.01 that the improvement is real. If your submission is a systems-only optimization (no ML changes), this step is waived.Validate tokenizer and dataset changes
If you changed the tokenizer or dataset, include explicit proof in your README that
val_bpb is calculated correctly. Describe how the tokenizer was validated.Confirm reproducibility on 8xH100 in under 10 minutes
Run the full training + evaluation on 8xH100 SXM hardware and confirm both stages finish within their respective 10-minute limits. Include the hardware info and wall-clock times in your README.
Include all required files
Your PR folder must contain
README.md, submission.json, train.log, and train_gpt.py. See Submission Requirements for details on each file and the submission.json format.Verification
OpenAI does not automatically verify every submission, but top leaderboard entries will be verified over time. Any result that cannot be reproduced may be disqualified. If you encounter issues reproducing a submission, raise them on the original PR.Compute grant
Training on 8xH100 hardware can be expensive. OpenAI is sponsoring $1,000,000 in compute credits to help participants get started.Request a compute grant
Use the compute grant form to request sponsored credits for your runs.
Seed tuning policy
Tuning Adam hyperparameters across multiple runs is explicitly permitted. However, brute-forcing seeds to cherry-pick lucky initializations is not in the spirit of the challenge and may result in disqualification.If you are unsure whether a particular approach crosses the line, ask in the #parameter-golf-discussions channel on the OpenAI Discord before submitting. There is no penalty for asking.
