Quickstart

Get Modern LLM running in minutes with our smoke test, or dive straight into using pre-trained checkpoints for evaluation and text generation.

Prerequisites

Before you begin, ensure you have:

Python 3.9 or higher
Git installed
(Optional) CUDA-capable GPU for faster training

5-minute smoke test

Run a quick smoke test to verify your installation and see the training pipeline in action:

Clone the repository

git clone https://github.com/AymanMahfuz27/modern_llm.git
cd modern_llm

Set up virtual environment

python -m venv .venv
source .venv/bin/activate

Install dependencies

pip install -r requirements.txt

Verify installation

python -c "from modern_llm.models import ModernDecoderLM; print('✓ Installation verified')"

You should see: ✓ Installation verified

Run smoke test

python scripts/run_pipeline.py --config local-smoke --stage all

This runs a minimal version of the full pipeline (pretrain → SFT → DPO → verifier) in ~5 minutes.

The smoke test uses tiny model and data sizes to verify everything works. For real training, use the local or gpu config presets.

Using pre-trained checkpoints

If you want to skip training and use our pre-trained 253M parameter model:

Verify checkpoints

python scripts/verify_checkpoints.py

This checks that all checkpoint files load correctly and displays model configurations.

Run evaluation

Compare the model against GPT-2 on WikiText-2:

python scripts/evaluate_and_compare.py

Expected results:

Model	Parameters	Perplexity
GPT-2	124M	40.64
Modern LLM (pretrain)	253M	27.03
Modern LLM (SFT)	253M	34.14
Modern LLM (DPO)	253M	34.32

Generate text

Try text generation with different checkpoints:

from pathlib import Path
from modern_llm.training import generate_text
from modern_llm.utils.checkpointing import load_checkpoint

# Load pretrained checkpoint
checkpoint_path = Path("checkpoints/pretrain_best.pt")
model, tokenizer = load_checkpoint(checkpoint_path, device="cuda")

# Generate text
prompt = "The future of artificial intelligence is"
generated = generate_text(
    model=model,
    tokenizer=tokenizer,
    prompt=prompt,
    max_new_tokens=50,
    temperature=0.8,
    top_k=40
)
print(generated)

Run math benchmark

Evaluate the model on GSM8K grade-school math problems with verifier reranking:

python scripts/benchmark_gsm8k.py

This generates multiple candidate solutions and uses the verifier to select the best one.

Training from scratch

If you want to train your own model:

Full pipeline
Individual stages
Custom config

Run all training stages sequentially:

python scripts/run_pipeline.py --config local --stage all

This takes approximately 24 hours on an RTX 3060 and produces checkpoints for all stages.

Run stages independently:

# Stage 1: Pretrain (language modeling)
python scripts/run_pipeline.py --config local --stage pretrain

# Stage 2: SFT (instruction tuning)
python scripts/run_pipeline.py --config local --stage sft \
  --checkpoint checkpoints/pretrain_best.pt

# Stage 3: DPO (preference alignment)
python scripts/run_pipeline.py --config local --stage dpo \
  --checkpoint checkpoints/sft_final.pt

# Stage 4: Verifier (solution scoring)
python scripts/run_pipeline.py --config local --stage verifier

Create a custom configuration:

from modern_llm.config import PipelineConfig

config = PipelineConfig(
    run_name="my_experiment",
    model_d_model=512,        # Smaller model
    model_n_layers=8,
    model_n_heads=8,
    pretrain_max_steps=5000,  # Fewer steps
    hardware_device="cuda",
    hardware_batch_size=8,
)

# Save config
config.save("configs/my_experiment.json")

Then run with your custom config:

python scripts/run_pipeline.py --config configs/my_experiment.json --stage all

Config presets

Choose a preset based on your hardware:

Preset	Hardware	Duration	Model size	Training tokens
`local-smoke`	Any (CPU/GPU)	~5 min	25M params	200K tokens
`local`	RTX 3060 12GB	~24 hours	253M params	600M tokens
`gpu-smoke`	A100/H100	~2 min	25M params	200K tokens
`gpu`	A100 40GB	~8 hours	768M params	2B tokens

The gpu preset requires significant compute resources. Start with local or local-smoke for experimentation.

Common issues

ImportError: No module named 'modern_llm'

Make sure you’ve installed the package and activated your virtual environment:

source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -r requirements.txt

You can also install in editable mode:

pip install -e .

CUDA out of memory error

Reduce the batch size in your configuration:

config = PipelineConfig(
    hardware_batch_size=4,           # Reduce from default 8
    hardware_gradient_accumulation_steps=4  # Increase to maintain effective batch size
)

Or use the smoke test config which has minimal memory requirements.

Checkpoint files not found

Checkpoints are saved in checkpoints/ directory. If you haven’t trained yet, you’ll need to either:

Train your own model using the pipeline
Download pre-trained checkpoints (if available)

Check the checkpoint directory:

ls -lh checkpoints/

Slow training on CPU

CPU training is significantly slower than GPU. For the smoke test:

CPU: ~5-10 minutes
GPU (RTX 3060): ~2-3 minutes

Consider using Google Colab or other cloud GPU providers for faster training.

Next steps

Architecture

Learn about RoPE, RMSNorm, SwiGLU, and attention sinks

Training pipeline

Deep dive into pretrain → SFT → DPO → verifier workflow

Configuration

Customize model architecture and training hyperparameters

API reference

Explore the complete API documentation

Get Started

Architecture

Training Pipeline

Guides

Prerequisites

5-minute smoke test

Using pre-trained checkpoints

Verify checkpoints

Run evaluation

Generate text

Run math benchmark

Training from scratch

Config presets

Common issues

Next steps

Architecture

Training pipeline

Configuration

API reference

Build docs developers (and LLMs) love

Get Started

Architecture

Training Pipeline

Guides

​Prerequisites

​5-minute smoke test

​Using pre-trained checkpoints

​Verify checkpoints

​Run evaluation

​Generate text

​Run math benchmark

​Training from scratch

​Config presets

​Common issues

​Next steps

Architecture

Training pipeline

Configuration

API reference

Build docs developers (and LLMs) love

Prerequisites

5-minute smoke test

Using pre-trained checkpoints

Verify checkpoints

Run evaluation

Generate text

Run math benchmark

Training from scratch

Config presets

Common issues

Next steps