Skip to main content
These instructions are for Ubuntu x86_64. Substitute apt-get with the appropriate package manager for other distributions.

Quick install (script)

1

Install the CUDA toolkit (GPU only)

Skip this step for CPU-only inference.
wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda_12.1.1_530.30.02_linux.run
sudo sh cuda_12.1.1_530.30.02_linux.run
Install only the toolkit — do not overwrite the existing driver or /usr/local/cuda symlink when prompted.
2

Run the installation script

curl -fsSL https://h2o-release.s3.amazonaws.com/h2ogpt/linux_install_full.sh | bash
Enter your sudo password when prompted. To avoid repeated password prompts, extend the sudo timeout first:
sudo visudo
# Add after the "Defaults env_reset" line:
# Defaults        timestamp_timeout=60
sudo bash
exit
3

Activate the h2oGPT environment

conda activate h2ogpt

Manual install

1

Set up a Python 3.10 environment with Miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
bash ./Miniconda3-py310_23.1.0-1-Linux-x86_64.sh -b -p $HOME/miniconda3

echo '### Conda init ###' >> $HOME/.bashrc
echo 'source $HOME/miniconda3/etc/profile.d/conda.sh' >> $HOME/.bashrc
echo 'conda activate' >> $HOME/.bashrc
source $HOME/.bashrc

conda update conda -y
conda create -n h2ogpt -y
conda activate h2ogpt
conda install python=3.10 -c conda-forge -y
Verify the Python version:
python --version
python -c "import os, sys ; print('hello world')"
The output should show 3.10.xx and print hello world.
2

Clone h2oGPT

git clone https://github.com/h2oai/h2ogpt.git
cd h2ogpt
3

Install CUDA toolkit and set environment variables (GPU only)

Skip this step for CPU inference.
wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda_12.1.1_530.30.02_linux.run
sudo sh cuda_12.1.1_530.30.02_linux.run
Add CUDA to your environment:
echo 'export CUDA_HOME=/usr/local/cuda-12.1' >> $HOME/.bashrc
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64:$CUDA_HOME/extras/CUPTI/lib64' >> $HOME/.bashrc
echo 'export PATH=$PATH:$CUDA_HOME/bin' >> $HOME/.bashrc
source $HOME/.bashrc
4

Set the PyTorch index URL

export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu121 https://huggingface.github.io/autogptq-index/whl/cu121"
5

Set llama_cpp_python build flags (GPU only)

export GGML_CUDA=1
export CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all"
export FORCE_CMAKE=1
Building with all CUDA architectures takes time but is required — llama_cpp_python fails if all is omitted from CMAKE_CUDA_ARCHITECTURES.
6

Run the installation script

bash docs/linux_install.sh
To include GPL-licensed packages (PyMuPDF, etc.):
GPLOK=1 bash docs/linux_install.sh
You can comment out optional sections in the script to skip packages you do not need.

Run h2oGPT

Verify that CUDA is visible to PyTorch (GPU only):
import torch
print(torch.cuda.is_available())  # should print True
Place documents for Q&A in a user_path directory, then start h2oGPT:
python generate.py \
  --base_model=h2oai/h2ogpt-4096-llama2-13b-chat \
  --load_8bit=True \
  --score_model=None \
  --langchain_mode='UserData' \
  --user_path=user_path
Models are cached in ~/.cache/ (huggingface, chroma, torch, etc.). Open http://localhost:7860 after the server starts.
Add --share=True to expose a public Gradio URL for remote access.

Troubleshooting

undefined symbol error with flash_attn Ensure CUDA_HOME matches the toolkit version used to build h2oGPT, then reinstall:
export CUDA_HOME=/usr/local/cuda-12.1
pip uninstall flash_attn autoawq autoawq-kernels -y
pip install flash_attn autoawq autoawq-kernels --no-cache-dir
protobuf import error
pip install protobuf==3.20.0
Ubuntu 18 (very out of date)
Only run the commands below on Ubuntu 18. Do not run them on Ubuntu 20 or 22.
apt-get clean all
apt-get update
apt-get -y full-upgrade
apt-get -y dist-upgrade
apt-get -y autoremove
apt-get clean all

Build docs developers (and LLMs) love