Get h2oGPT running in minutes using pip. Chat with a GGUF model locally and call the OpenAI-compatible API.
This guide gets you from zero to a working h2oGPT instance using pip. For a full-featured setup with vision, audio, and image generation, use the Docker install instead.Prerequisites: Python 3.10, and either a CPU or an NVIDIA GPU with CUDA 11.8 or 12.1.
h2oGPT downloads the model on first run, then starts both the Gradio UI and the OpenAI API server.For a larger context window (requires more GPU memory):
h2oGPT also starts an OpenAI-compatible server at http://localhost:5000/v1. You can use it with any OpenAI SDK client.
from openai import OpenAIclient = OpenAI( base_url="http://localhost:5000/v1", api_key="EMPTY",)response = client.chat.completions.create( model="TheBloke/Mistral-7B-Instruct-v0.2-GGUF", messages=[{"role": "user", "content": "What is h2oGPT?"}],)print(response.choices[0].message.content)
The api_key value is ignored by default. To require key-based authentication, pass --h2ogpt_api_keys to generate.py with a JSON file listing valid keys.