Installation

This guide provides comprehensive instructions for installing Llama 2 and all its dependencies.

System Requirements

Hardware Requirements

Llama 2 models have varying hardware requirements based on size:

Model Size	Model Parallel (MP)	Minimum GPU Memory	Recommended GPUs
7B	1	16 GB	1x A100 or V100
13B	2	32 GB	2x A100 or V100
70B	8	128 GB	8x A100 or V100

All models support sequence lengths up to 4096 tokens, but memory is pre-allocated based on max_seq_len and max_batch_size parameters.

Software Requirements

Operating System: Linux (Ubuntu 18.04+, CentOS 7+) or macOS
Python: 3.8 or higher
CUDA: 11.0 or higher (for GPU acceleration)
conda: Anaconda or Miniconda
Utilities: wget, md5sum (or md5 on macOS)

Installation Steps

Set Up Conda Environment

Create a new conda environment with Python 3.8+:

conda create -n llama python=3.8
conda activate llama

This isolates Llama 2 dependencies from other projects.

Install PyTorch with CUDA

Install PyTorch with CUDA support for GPU acceleration:

conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia

CPU-only installation is not recommended for production use. Inference will be significantly slower.

Verify PyTorch installation:

python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA Available: {torch.cuda.is_available()}')"

Expected output:

PyTorch: 2.0.0
CUDA Available: True

Clone Llama Repository

Clone the official Llama 2 repository:

git clone https://github.com/facebookresearch/llama.git
cd llama

Install Llama Package

Install the Llama package in editable mode:

pip install -e .

This installs the following dependencies from requirements.txt:

Package	Purpose
`torch`	PyTorch deep learning framework
`fairscale`	Model parallelism and memory optimization
`fire`	Command-line interface generation
`sentencepiece`	Tokenization library

The -e flag installs in editable mode, allowing you to modify the source code.

Verify Installation

Verify the installation by importing the Llama module:

python -c "from llama import Llama; print('Llama package installed successfully')"

If successful, you should see:

Llama package installed successfully

Model Download Process

Request Access

Visit the Meta Llama Downloads page and complete the registration form.You’ll need to provide:

Name and email
Organization (optional)
Country
Intended use case

Accept License

Review and accept the Llama 2 Community License Agreement.

Ensure you understand the license terms, including acceptable use policies and restrictions.

Receive Download URL

After approval (typically within hours), you’ll receive an email with a unique, signed download URL.

URLs expire after 24 hours
URLs have download limits
You can request new URLs if needed

Download Models

The download.sh script automates model and tokenizer downloads:

chmod +x download.sh
./download.sh

Script Workflow

Enter Download URL

When prompted, paste the URL from your email:

Enter the URL from email: https://download.llamameta.net/*?Policy=...

Manually copy-paste the URL. Do not use browser “Copy Link” functionality.

Select Models

Choose which models to download:

Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all:

Options:

Pretrained models: 7B, 13B, 70B
Chat models: 7B-chat, 13B-chat, 70B-chat
Download all: Press Enter without input
Download specific: 7B,7B-chat (no spaces)

Model Variants Explained

Pretrained Models (7B, 13B, 70B)

Base models trained on text completion
Use for tasks where the answer is a natural continuation
Example: "The theory of relativity states that" → model completes

Chat Models (7B-chat, 13B-chat, 70B-chat)

Fine-tuned for dialogue and instruction-following
Require specific formatting with INST and <<SYS>> tags
Better for conversational AI and Q&A

Download and Verify

The script will:

Download LICENSE and usage policy
Download tokenizer and verify checksums
For each model:
- Download model shards (consolidated.*.pth)
- Download configuration (params.json)
- Verify file integrity with checksums

Downloading LICENSE and Acceptable Usage Policy
Downloading tokenizer
Downloading llama-2-7b-chat
Checking checksums

Model downloads are large:

7B: ~13 GB
13B: ~26 GB
70B: ~138 GB

Ensure you have sufficient disk space.

Understanding Downloaded Files

After downloading, your directory structure will look like:

llama/
├── llama-2-7b-chat/
│   ├── consolidated.00.pth      # Model weights
│   ├── params.json               # Model configuration
│   └── checklist.chk             # Checksum file
├── llama-2-7b/
│   ├── consolidated.00.pth
│   ├── params.json
│   └── checklist.chk
├── tokenizer.model               # Shared tokenizer
├── tokenizer_checklist.chk       # Tokenizer checksum
├── LICENSE                       # License agreement
└── USE_POLICY.md                # Acceptable use policy

Understanding Model Shards

Larger models are split into multiple shard files:

Model	Shards	Files
7B	1	`consolidated.00.pth`
13B	2	`consolidated.00.pth`, `consolidated.01.pth`
70B	8	`consolidated.00.pth` through `consolidated.07.pth`

Sharding enables distribution across multiple GPUs for model parallelism.

Alternative: Hugging Face Downloads

You can also access Llama 2 models through Hugging Face:

Request Hugging Face Access

Visit a Llama 2 model repository on Hugging Face (e.g., meta-llama/Llama-2-7b-chat-hf).Acknowledge the license and fill out the access form.

Install Hugging Face Libraries

pip install transformers accelerate

Download and Use Models

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16
)

The official Llama repository provides more control and lower-level access, while Hugging Face offers easier integration with the transformers ecosystem.

Verification and Testing

After installation, verify everything works:

Quick Verification

# Test with chat model
torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir llama-2-7b-chat/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 512 \
    --max_batch_size 6

If successful, you’ll see chat completions for pre-configured dialogs.

Test Text Completion

# Test with pretrained model
torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 \
    --max_batch_size 4

Troubleshooting

Import Error: No module named 'llama'

Ensure you’re in the correct conda environment and installed the package:

conda activate llama
pip install -e .

CUDA out of memory

Reduce memory allocation parameters:

torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir llama-2-7b-chat/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 256 \
    --max_batch_size 2

Or use a smaller model (7B instead of 13B/70B).

Download script fails with 403 error

Your download URL has expired. Request a new URL from the Meta website.

Checksum verification fails

The downloaded files may be corrupted. Delete the affected model directory and re-run the download script.

rm -rf llama-2-7b-chat/
./download.sh

fairscale installation fails

Some systems require manual fairscale installation:

pip install fairscale --no-build-isolation

Or install from source:

git clone https://github.com/facebookresearch/fairscale.git
cd fairscale
pip install .

Next Steps

Quickstart Guide

Run your first inference in minutes

Llama Cookbook

Advanced examples and integrations

Model Card

Detailed model specifications

FAQ

Frequently asked questions

Additional Resources

Research Paper - Technical details and benchmarks
Responsible Use Guide - Safety and ethical guidelines
License - Terms of use
Acceptable Use Policy - Usage restrictions

Get Started

Model Usage

Core Concepts

Model Variants

System Requirements

Hardware Requirements

Software Requirements

Installation Steps

Model Download Process

Request Access

Download Models

Script Workflow

Understanding Downloaded Files

Alternative: Hugging Face Downloads

Verification and Testing

Quick Verification

Test Text Completion

Troubleshooting

Next Steps

Quickstart Guide

Llama Cookbook

Model Card

FAQ

Additional Resources

Build docs developers (and LLMs) love

Get Started

Model Usage

Core Concepts

Model Variants

​System Requirements

​Hardware Requirements

​Software Requirements

​Installation Steps

​Model Download Process

​Request Access

​Download Models

​Script Workflow

​Understanding Downloaded Files

​Alternative: Hugging Face Downloads

​Verification and Testing

​Quick Verification

​Test Text Completion

​Troubleshooting

​Next Steps

Quickstart Guide

Llama Cookbook

Model Card

FAQ

​Additional Resources

Build docs developers (and LLMs) love

System Requirements

Hardware Requirements

Software Requirements

Installation Steps

Model Download Process

Request Access

Download Models

Script Workflow

Understanding Downloaded Files

Alternative: Hugging Face Downloads

Verification and Testing

Quick Verification

Test Text Completion

Troubleshooting

Next Steps

Additional Resources