System Requirements
Hardware Requirements
Llama 2 models have varying hardware requirements based on size:| Model Size | Model Parallel (MP) | Minimum GPU Memory | Recommended GPUs |
|---|---|---|---|
| 7B | 1 | 16 GB | 1x A100 or V100 |
| 13B | 2 | 32 GB | 2x A100 or V100 |
| 70B | 8 | 128 GB | 8x A100 or V100 |
All models support sequence lengths up to 4096 tokens, but memory is pre-allocated based on
max_seq_len and max_batch_size parameters.Software Requirements
- Operating System: Linux (Ubuntu 18.04+, CentOS 7+) or macOS
- Python: 3.8 or higher
- CUDA: 11.0 or higher (for GPU acceleration)
- conda: Anaconda or Miniconda
- Utilities:
wget,md5sum(ormd5on macOS)
Installation Steps
Set Up Conda Environment
Create a new conda environment with Python 3.8+:This isolates Llama 2 dependencies from other projects.
Install PyTorch with CUDA
Install PyTorch with CUDA support for GPU acceleration:Verify PyTorch installation:Expected output:
Install Llama Package
Install the Llama package in editable mode:This installs the following dependencies from
The
requirements.txt:| Package | Purpose |
|---|---|
torch | PyTorch deep learning framework |
fairscale | Model parallelism and memory optimization |
fire | Command-line interface generation |
sentencepiece | Tokenization library |
-e flag installs in editable mode, allowing you to modify the source code.Model Download Process
Request Access
Register for Access
Visit the Meta Llama Downloads page and complete the registration form.You’ll need to provide:
- Name and email
- Organization (optional)
- Country
- Intended use case
Download Models
Thedownload.sh script automates model and tokenizer downloads:
Script Workflow
Select Models
Choose which models to download:Options:
- Pretrained models:
7B,13B,70B - Chat models:
7B-chat,13B-chat,70B-chat - Download all: Press Enter without input
- Download specific:
7B,7B-chat(no spaces)
Model Variants Explained
Model Variants Explained
Pretrained Models (
7B, 13B, 70B)- Base models trained on text completion
- Use for tasks where the answer is a natural continuation
- Example:
"The theory of relativity states that"→ model completes
7B-chat, 13B-chat, 70B-chat)- Fine-tuned for dialogue and instruction-following
- Require specific formatting with
INSTand<<SYS>>tags - Better for conversational AI and Q&A
Download and Verify
The script will:
- Download LICENSE and usage policy
- Download tokenizer and verify checksums
- For each model:
- Download model shards (
consolidated.*.pth) - Download configuration (
params.json) - Verify file integrity with checksums
- Download model shards (
Model downloads are large:
- 7B: ~13 GB
- 13B: ~26 GB
- 70B: ~138 GB
Understanding Downloaded Files
After downloading, your directory structure will look like:Understanding Model Shards
Understanding Model Shards
Larger models are split into multiple shard files:
Sharding enables distribution across multiple GPUs for model parallelism.
| Model | Shards | Files |
|---|---|---|
| 7B | 1 | consolidated.00.pth |
| 13B | 2 | consolidated.00.pth, consolidated.01.pth |
| 70B | 8 | consolidated.00.pth through consolidated.07.pth |
Alternative: Hugging Face Downloads
You can also access Llama 2 models through Hugging Face:Request Hugging Face Access
Visit a Llama 2 model repository on Hugging Face (e.g., meta-llama/Llama-2-7b-chat-hf).Acknowledge the license and fill out the access form.
The official Llama repository provides more control and lower-level access, while Hugging Face offers easier integration with the transformers ecosystem.
Verification and Testing
After installation, verify everything works:Quick Verification
Test Text Completion
Troubleshooting
Import Error: No module named 'llama'
Import Error: No module named 'llama'
Ensure you’re in the correct conda environment and installed the package:
CUDA out of memory
CUDA out of memory
Reduce memory allocation parameters:Or use a smaller model (7B instead of 13B/70B).
Download script fails with 403 error
Download script fails with 403 error
Your download URL has expired. Request a new URL from the Meta website.
Checksum verification fails
Checksum verification fails
The downloaded files may be corrupted. Delete the affected model directory and re-run the download script.
fairscale installation fails
fairscale installation fails
Some systems require manual fairscale installation:Or install from source:
Next Steps
Quickstart Guide
Run your first inference in minutes
Llama Cookbook
Advanced examples and integrations
Model Card
Detailed model specifications
FAQ
Frequently asked questions
Additional Resources
- Research Paper - Technical details and benchmarks
- Responsible Use Guide - Safety and ethical guidelines
- License - Terms of use
- Acceptable Use Policy - Usage restrictions