Skip to main content

System requirements

Hardware requirements

Minimum (smoke test)

  • CPU: Any modern x86-64 or ARM64 processor
  • RAM: 8GB
  • Storage: 10GB free space
  • GPU: Not required (CPU-only mode available)

Recommended (full training)

  • CPU: 8+ core modern processor
  • RAM: 16GB+
  • Storage: 50GB+ SSD
  • GPU: NVIDIA RTX 3060 (12GB VRAM) or better
The 253M parameter model was successfully trained on an RTX 3060 (12GB VRAM) using the included configuration. Larger GPUs like RTX 4090 or A100 allow for bigger models and faster training.

Software requirements

  • Operating System: Linux, macOS, or Windows
  • Python: 3.9 or higher (3.10+ recommended)
  • CUDA: 11.8 or higher (for GPU support)
  • Git: For cloning the repository
Python 3.8 and earlier are not supported due to dependencies on modern type hints and dataclass features.

Installation methods

Verify installation

After installation, verify that everything is set up correctly:

Check Python version

python --version
Should output Python 3.9 or higher.

Check PyTorch installation

python -c "import torch; print(f'PyTorch {torch.__version__}')"
Expected output:
PyTorch 2.3.0+cu118

Check CUDA availability

python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
python -c "import torch; print(f'CUDA version: {torch.version.cuda}')"
python -c "import torch; print(f'GPU count: {torch.cuda.device_count()}')"
For GPU systems, you should see:
CUDA available: True
CUDA version: 11.8
GPU count: 1
If CUDA is not available but you have an NVIDIA GPU, you may need to install or update your CUDA drivers.

Check package imports

Verify all key modules import correctly:
python -c "from modern_llm.models import ModernDecoderLM; print('✓ Models')"
python -c "from modern_llm.config import ModernLLMConfig; print('✓ Config')"
python -c "from modern_llm.training import run_training; print('✓ Training')"
python -c "from transformers import AutoTokenizer; print('✓ Transformers')"
python -c "from datasets import load_dataset; print('✓ Datasets')"
All checks should print a green checkmark.

Run setup verification script

Use the included script to perform comprehensive checks:
python scripts/setup_check.py
This script verifies:
  • Python version
  • PyTorch installation and version
  • CUDA availability and version
  • All required packages
  • GPU memory availability
  • Model initialization
  • Dataset loading
=== Modern LLM Setup Check ===

✓ Python 3.10.12
✓ PyTorch 2.3.0+cu118  
✓ CUDA available: True
✓ CUDA version: 11.8
✓ GPU: NVIDIA GeForce RTX 3060 (12GB)
✓ All required packages installed
✓ Model initialization OK
✓ Dataset loading OK

=== Setup Complete ===

Configuration

Download datasets

The first time you run training or evaluation, datasets will be automatically downloaded from Hugging Face:
  • WikiText-103 (~200MB) - Pretraining corpus
  • TinyStories (~500MB) - Pretraining corpus
  • Alpaca (~50MB) - Instruction tuning dataset
  • HH-RLHF (~500MB) - Preference alignment dataset
  • GSM8K (~10MB) - Math reasoning benchmark
To pre-download datasets:
python scripts/verify_datasets.py
Datasets are cached in ~/.cache/huggingface/datasets by default. Set HF_DATASETS_CACHE environment variable to use a different location.

Environment variables

Optional environment variables for customization:
# Cache directory for Hugging Face datasets
export HF_DATASETS_CACHE="/path/to/datasets"

# Cache directory for Hugging Face models
export HF_HOME="/path/to/models"

# Number of threads for PyTorch
export OMP_NUM_THREADS=8

# CUDA device (if multiple GPUs)
export CUDA_VISIBLE_DEVICES=0

Hardware-specific configurations

The repository includes optimized configurations for different hardware:

RTX 3060 (12GB)

# Use the included RTX 3060 config
python scripts/run_pipeline.py \
  --config configs/lm_max_rtx3060.json \
  --stage pretrain
Configuration:
  • d_model: 768
  • n_layers: 12
  • batch_size: 64 (with gradient accumulation)
  • micro_batch_size: 2
  • mixed_precision: bf16

High-end GPU (24GB+)

# Use the GPU preset
python scripts/run_pipeline.py \
  --config gpu \
  --stage pretrain
Configuration:
  • d_model: 1024
  • n_layers: 24
  • batch_size: 128
  • micro_batch_size: 8
  • mixed_precision: bf16

Troubleshooting

If you see CUDA version errors:
RuntimeError: CUDA version mismatch
Solution: Reinstall PyTorch matching your CUDA version:
# For CUDA 11.8
pip install torch --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch --index-url https://download.pytorch.org/whl/cu121
If pip runs out of memory during installation:
# Install packages one at a time
pip install torch
pip install transformers
pip install datasets
# ... etc
Or use --no-cache-dir:
pip install --no-cache-dir -r requirements.txt
If you see permission errors when installing:
# Use --user flag (not recommended with venv)
pip install --user -r requirements.txt

# Or fix venv permissions
sudo chown -R $USER:$USER .venv
Using --user with a virtual environment can cause conflicts. Prefer fixing permissions.
If dataset downloads are very slow:
  1. Use a different Hugging Face mirror:
    export HF_ENDPOINT="https://hf-mirror.com"
    
  2. Download datasets manually:
    from datasets import load_dataset
    load_dataset("wikitext", "wikitext-103-raw-v1")
    load_dataset("roneneldan/TinyStories")
    load_dataset("tatsu-lab/alpaca")
    load_dataset("Anthropic/hh-rlhf")
    load_dataset("gsm8k", "main")
    
If imports fail even after installation:
  1. Verify virtual environment is activated:
    which python  # Should point to .venv/bin/python
    
  2. Reinstall in development mode:
    pip install -e .
    
  3. Check Python path:
    python -c "import sys; print('\n'.join(sys.path))"
    

Next steps

Once installation is complete:

Run quick start

Try the 5-minute smoke test to verify everything works

Train a model

Start training your first model with the pipeline

Explore architecture

Learn about the model architecture and components

Configuration guide

Customize model size and training parameters

Package dependencies

Complete list of dependencies from requirements.txt:
torch>=2.3.0
torchvision>=0.18.0
torchaudio>=2.3.0
transformers>=4.44.0
datasets>=3.0.0
accelerate>=1.0.0
peft>=0.12.0         # Parameter-efficient fine-tuning
trl>=0.9.0           # Transformer Reinforcement Learning
sentencepiece>=0.2.0 # Tokenization
evaluate>=0.4.2
rouge-score>=0.1.2
scikit-learn>=1.4.0
numpy>=1.26.0
pandas>=2.2.0
tqdm>=4.66.0
matplotlib>=3.8.0
seaborn>=0.13.0
pyyaml>=6.0.0
pytest>=8.0.0

Build docs developers (and LLMs) love