System requirements
Hardware requirements
Minimum (smoke test)
- CPU: Any modern x86-64 or ARM64 processor
- RAM: 8GB
- Storage: 10GB free space
- GPU: Not required (CPU-only mode available)
Recommended (full training)
- CPU: 8+ core modern processor
- RAM: 16GB+
- Storage: 50GB+ SSD
- GPU: NVIDIA RTX 3060 (12GB VRAM) or better
Software requirements
- Operating System: Linux, macOS, or Windows
- Python: 3.9 or higher (3.10+ recommended)
- CUDA: 11.8 or higher (for GPU support)
- Git: For cloning the repository
Installation methods
- From source (recommended)
- With conda
- Docker (experimental)
Clone and install from source
This is the recommended installation method for development and customization.Install dependencies
Install all required packages:This installs:
- PyTorch 2.3.0+ with CUDA support
- Transformers 4.44.0+ for tokenizers and models
- Datasets 3.0.0+ for data loading
- Training utilities (accelerate, PEFT, TRL)
- Evaluation tools (evaluate, rouge-score)
- Scientific computing (numpy, pandas, scikit-learn)
- Visualization (matplotlib, seaborn)
Verify installation
After installation, verify that everything is set up correctly:Check Python version
Check PyTorch installation
Check CUDA availability
If CUDA is not available but you have an NVIDIA GPU, you may need to install or update your CUDA drivers.
Check package imports
Verify all key modules import correctly:Run setup verification script
Use the included script to perform comprehensive checks:- Python version
- PyTorch installation and version
- CUDA availability and version
- All required packages
- GPU memory availability
- Model initialization
- Dataset loading
Configuration
Download datasets
The first time you run training or evaluation, datasets will be automatically downloaded from Hugging Face:- WikiText-103 (~200MB) - Pretraining corpus
- TinyStories (~500MB) - Pretraining corpus
- Alpaca (~50MB) - Instruction tuning dataset
- HH-RLHF (~500MB) - Preference alignment dataset
- GSM8K (~10MB) - Math reasoning benchmark
Environment variables
Optional environment variables for customization:Hardware-specific configurations
The repository includes optimized configurations for different hardware:RTX 3060 (12GB)
- d_model: 768
- n_layers: 12
- batch_size: 64 (with gradient accumulation)
- micro_batch_size: 2
- mixed_precision: bf16
High-end GPU (24GB+)
- d_model: 1024
- n_layers: 24
- batch_size: 128
- micro_batch_size: 8
- mixed_precision: bf16
Troubleshooting
CUDA version mismatch
CUDA version mismatch
If you see CUDA version errors:Solution: Reinstall PyTorch matching your CUDA version:
Out of memory during installation
Out of memory during installation
If pip runs out of memory during installation:Or use
--no-cache-dir:Permission denied errors
Permission denied errors
If you see permission errors when installing:
Slow dataset downloads
Slow dataset downloads
If dataset downloads are very slow:
-
Use a different Hugging Face mirror:
-
Download datasets manually:
Import errors after installation
Import errors after installation
If imports fail even after installation:
-
Verify virtual environment is activated:
-
Reinstall in development mode:
-
Check Python path:
Next steps
Once installation is complete:Run quick start
Try the 5-minute smoke test to verify everything works
Train a model
Start training your first model with the pipeline
Explore architecture
Learn about the model architecture and components
Configuration guide
Customize model size and training parameters
Package dependencies
Complete list of dependencies fromrequirements.txt:
Core dependencies
Core dependencies
Training utilities
Training utilities
Evaluation and metrics
Evaluation and metrics
Scientific computing
Scientific computing
Visualization and utilities
Visualization and utilities