Key features
- Private and offline — all inference runs locally; no data is sent to external servers unless you configure a remote backend
- Apache 2.0 license — free for personal and commercial use
- OpenAI-compatible API — h2oGPT acts as a drop-in replacement for the OpenAI server on
localhost:5000/v1, with chat completions, embeddings, audio, image generation, and function tool calling - Gradio UI — streaming chat UI with multi-model bake-off, document upload, authentication, and state preservation
- Document Q&A — persistent vector databases (Chroma, Weaviate, FAISS) over PDFs, Word, Excel, images, video frames, YouTube, audio, code, Markdown, and more
- Vision and image support — understand images with LLaVA, Claude-3, Gemini-Pro-Vision, and GPT-4-Vision; generate images with Stable Diffusion (SDXL, SD3) and Flux
- Voice STT and TTS — Whisper speech-to-text with streaming audio; Microsoft SpeechT5 and Coqui TTS with voice cloning
- Wide model support — LLaMa 2/3, Mistral, Falcon, Vicuna, WizardLM, and others via HuggingFace Transformers, llama.cpp GGUF, AutoGPTQ, ExLLaMa, vLLM, TGI, and more
- Agents — autonomous agents for web search, document Q&A, Python code execution, and CSV analysis
- Cross-platform — Linux, macOS (CPU and Metal M1/M2), Windows 10/11, and Docker
Choose an installation method
Docker
Recommended for Linux, Windows, and macOS. Provides full capabilities including GPU inference, vision, audio, and image generation without manual dependency management.
Linux
Native install on Ubuntu x86_64 using Miniconda and pip. Supports CUDA 12.1/11.8 and CPU modes.
Windows
Install on Windows 10/11 using a single
.bat script with Miniconda, Visual Studio build tools, and optional CUDA support.macOS
Native install for Apple Silicon (M1/M2 Metal MPS) and Intel Macs using Miniconda and pip.
Architecture overview
When you runpython generate.py, h2oGPT starts two servers:
- Gradio UI at
http://localhost:7860— the interactive chat and document interface - OpenAI-compatible API server at
http://localhost:5000/v1— a REST API that any OpenAI SDK client can connect to
Some optional packages — DocTR, Unstructured, Florence-2, Stable Diffusion — download additional model weights at runtime. Progress is shown in the console.