Skip to main content
h2oGPT lets you run powerful large language models entirely on your own hardware — no data leaves your network. Chat with documents, images, and audio; connect dozens of model backends; and expose a drop-in OpenAI-compatible API server.

Quick Start

Get h2oGPT running in minutes with pip or Docker

Docker Install

Recommended for full capabilities on Linux, Windows, and macOS

Document Q&A

Chat with PDFs, Word docs, spreadsheets, images, and more

API Reference

OpenAI-compatible REST API for chat, embeddings, audio, and images

Key capabilities

Chat UI

Gradio-based UI with streaming, multi-model bake-off, and saved chats

Vision & Images

Understand images with LLaVA and generate images with Stable Diffusion

Voice STT / TTS

Whisper speech-to-text and Microsoft SpeechT5 text-to-speech

Agents

Autonomous agents for web search, Python code, CSV analysis, and more

Model Backends

oLLaMa, vLLM, llama.cpp, GPT4All, HF TGI, and many more

Fine-tuning

Fine-tune models with LoRA on your own data

Get started in three steps

1

Install h2oGPT

The fastest path is pip install. For full capabilities including vision, audio, and image generation, use Docker.
pip install h2ogpt
2

Launch the server

Run generate.py pointing at any supported model. h2oGPT starts a Gradio UI and an OpenAI-compatible API server automatically.
python generate.py --base_model=TheBloke/Mistral-7B-Instruct-v0.2-GGUF \
  --prompt_type=mistral --max_seq_len=4096
3

Open the UI or call the API

Visit http://localhost:7860 for the chat UI, or connect any OpenAI SDK client to http://localhost:5000/v1.
h2oGPT is 100% private — all inference runs locally. No data is sent to external servers unless you explicitly configure a remote inference backend.

Build docs developers (and LLMs) love