What qualifies
Non-record submissions are appropriate for any of the following:Interesting approaches
Unusual or out-of-the-box architectures and techniques that don’t yet beat SOTA but explore genuinely new territory.
In-progress solutions
Unoptimized or partially complete implementations that run successfully and demonstrate a real idea worth sharing.
Negative results
Experiments that didn’t improve the score but produced clear learnings. Interesting negative results are valuable and welcome.
Unlimited compute
Runs that exceed the 10-minute compute limit. These go into a separate unlimited-compute track while still adhering to the 16 MB artifact cap.
Ideas worth exploring
The challenge is designed to push participants toward creative solutions. Some directions that may be especially interesting in the parameter-constrained regime:- Test-time compute / inference-time scaling — spend more FLOP at inference rather than training
- Depth recurrence and parameter tying — reuse layers to pack more effective depth into the byte budget
- Novel tokenizers with non-standard vocabularies — the baseline uses a 1024-token vocabulary; smaller or larger vocabularies may compress differently
- Quantization-aware training (QAT) — train directly in low precision to reduce the gap between pre-quant and post-quant scores
- Long context approaches — use longer sequence lengths during training or evaluation
- Megakernels / custom CUDA ops — fused or specialized kernels that change what is practical within the time budget
Submission format
Non-record submissions follow the same format as record submissions. Your PR folder must include:README.md— explains the submission in reasonable detailsubmission.json— includes your name, GitHub ID,val_bpb, and metadatatrain.log— exact log produced by the training scripttrain_gpt.py— the training script, which must compile and run from within the records folder
submission.json format.
Folder location
Non-record submissions go under/records/track_non_record_16mb/ with a date-prefixed folder name:
val_bpb: 1.2074 (vs. 1.2244 for the 10-minute record), demonstrating the gains available from more compute even with the same architecture:
Unlimited compute track
Runs that exceed the 10-minute wall-clock limit can still be submitted to the non-record track. Note your compute usage explicitly in the README:When submitting an unlimited-compute run, note
MAX_WALLCLOCK_SECONDS=0 (or the actual limit you used) in your README so reviewers understand the compute budget. The artifact size limit of 16,000,000 bytes still applies.PRs on core code
Thetrain_gpt.py and train_gpt_mlx.py scripts are intended as good starting points for new participants, not SOTA configs. PRs directly against these scripts are accepted if they tune, improve, or simplify the code without significantly increasing complexity.
Improvements to the core training scripts should remain general-purpose launch points. Any best-performing model configs should be kept in the
/records/ folder rather than baked into the root scripts.