Run h2oGPT fully offline by pre-downloading models, disabling telemetry, and setting gradio_offline_level.
h2oGPT can operate in fully air-gapped environments with no outbound internet access. This requires downloading all required models and assets in advance and setting the appropriate offline flags at startup.
Download the GGUF file directly and place it in llamacpp_path/:
# Online — download the model file manuallywget "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q5_K_M.gguf?download=true" \ -O llamacpp_path/zephyr-7b-beta.Q5_K_M.gguf# Offline — run using the local fileTRANSFORMERS_OFFLINE=1 python generate.py \ --base_model=zephyr-7b-beta.Q5_K_M.gguf \ --prompt_type=zephyr \ --gradio_offline_level=2 \ --share=False \ --add_disk_models_to_ui=False
For non-HuggingFace model formats (GGUF, GPTQ, etc.), you must specify the exact filename. h2oGPT cannot resolve a HuggingFace hub name to a local file without internet access.
Normal operation — downloads fonts and external assets
1
Backend offline only — fonts still load from Google (better appearance)
2
Fully air-gapped — replaces Google Fonts with local fallbacks
Use --gradio_offline_level=2 for true air-gapped deployments. The UI fonts will look slightly different, but no outbound requests are made.
Gradio may still attempt to load iframeResizer.contentWindow.min.js from a CDN. This is non-blocking — h2oGPT works without it. A simple firewall rule is sufficient to block it.
Archive these directories and restore them on the offline machine.
Use --prepare_offline_level=1 if you only need h2oGPT itself and not the assets for vLLM or TGI inference servers. This significantly reduces the download size.
Always set --prompt_type explicitly when using absolute paths or GGUF files, since the prompt type lookup requires internet access to resolve HuggingFace model names.
h2oGPT automatically disables HuggingFace telemetry, Gradio telemetry, and ChromaDB PostHog in its core path. You can explicitly disable additional telemetry: