The challenge runs from March 18 to April 30, 2026. OpenAI is sponsoring $1,000,000 in compute credits to help participants get started. Request a compute grant.
Tracks
10-min 16MB (Official)
The main competitive leaderboard. Submissions must train in under 10 minutes on 8xH100 SXM GPUs and fit within a 16MB artifact. New SOTA records require a 0.005-nat improvement over the current best.
Non-Record 16MB
Open submissions for interesting approaches that don’t meet the 10-minute compute limit or are experimental in nature. Still subject to the 16MB artifact cap. Results appear in the Notable Non-Record Runs table.
10-min 16MB Leaderboard
| Rank | Run | BPB Score | Author | Summary | Date |
|---|---|---|---|---|---|
| 1 | Naive Baseline | 1.2244 | Baseline | 9-layer 512-dim 1024-vocab tied embeddings, 4 KV heads | 2026-03-18 |
Scores reflect the post-quantization int8+zlib roundtrip metric, which is the canonical evaluation result. The model artifact for the current record is 15,863,489 bytes total (15,815,847 bytes model + 47,642 bytes code).
Current Record: Naive Baseline
The baseline entry establishes the starting point for the challenge. Key details:- Architecture: 9 layers, 512 model dim, 1024 vocab (SentencePiece BPE), 8 attention heads, 4 KV heads
- Tied embeddings: Input and output embeddings are shared (
TIE_EMBEDDINGS=1) - Training stopped at: step 13,780 of 20,000 (wallclock cap hit at ~600 seconds)
- Pre-quant BPB: 1.2172 — Post-quant BPB: 1.2244
- Total tokens seen: ~7.2B
- Peak GPU memory: 10,184 MiB allocated
Notable Non-Record Runs
These submissions are interesting or exploratory and don’t meet the 10-minute compute constraint for the main leaderboard. They still satisfy the 16MB artifact limit.| Run | BPB Score | Author | Summary | Date |
|---|---|---|---|---|
| 4-Hour Baseline | 1.2074 | Will DePue | Same 9x512 SP-1024 layout, unlimited compute — 4 hours on 8xH100 | 2026-03-18 |
Spotlight: 4-Hour Baseline
This entry shows the performance ceiling of the baseline architecture given unrestricted compute time, establishing a useful reference point for what the 10-minute budget leaves on the table.- Training time: 4 hours (14,400 seconds) on 8xH100
- Steps completed: 329,430 of 500,000
- Pre-quant BPB: 1.1749 — Post-quant BPB: 1.2074
- Total tokens seen: ~172.7B
- Artifact size: 15,810,161 bytes total
How Scoring Works
What is bits-per-byte (BPB)?
What is bits-per-byte (BPB)?
BPB measures how well a model compresses text, expressed in bits per byte of raw UTF-8 input. It is tokenizer-agnostic — the denominator is always raw bytes, not tokens — which makes it a fair comparison across submissions that use different vocabularies or tokenization schemes.A perfect compressor approaches the entropy of English text (~1.0 BPB). Lower values indicate better compression and, by proxy, better language modeling.The official score is the post-quantization roundtrip BPB: the model is serialized to int8+zlib format, deserialized, and then evaluated. This captures the real-world fidelity cost of compression.
What counts toward the 16MB artifact size?
What counts toward the 16MB artifact size?
The artifact is: code bytes + compressed model bytes.
- All code must live in the
train_gpt.pyscript. - The cap is decimal 16MB = 16,000,000 bytes (not 16 MiB = 16,777,216 bytes).
- No external downloads, training dataset access, or network calls are allowed during evaluation.
- The artifact must be fully self-contained and reproducible.
What is the 0.005-nat improvement requirement?
What is the 0.005-nat improvement requirement?
To claim a new SOTA record on the 10-min 16MB track, your submission must beat the current best score by at least 0.005 nats (roughly 0.0072 BPB). Because of inter-run variance, you must provide enough run logs to demonstrate this improvement at
p < 0.01.This requirement is waived for submissions that improve speed through pure systems optimization without changing the ML.Verification
Verification checks that:- The
train_gpt.pyscript compiles and runs from within the record folder - The reported
val_bpbmatches the script output within rounding tolerance - The total artifact size is under 16,000,000 bytes
- Wall-clock training time is under 10 minutes on 8xH100 SXM (for official track entries)
