Starting the server
The OpenAI-compatible server is enabled by default when you rungenerate.py. To be explicit, pass --openai_server=True:
5000 by default. On macOS, ports 5000 and 7000 are reserved by AirPlay, so the default shifts to port 5001.
To use a custom port:
Base URL
localhost and 5000 with your server host and port. Both http and https are accepted when using a proxy or direct SSL configuration.
Authentication
By default the server runs without API key enforcement. To require an API key:api_key="EMPTY" or any non-empty string.
Set
--enforce_h2ogpt_ui_key=True to separately require authentication for the Gradio UI while keeping the API open, or vice versa.Parallel workers
To scale throughput, launch multiple isolated FastAPI worker processes:Available endpoints
| Method | Path | Description |
|---|---|---|
GET | /v1/models | List all loaded models |
GET | /v1/models/{model} | Get info for a specific model |
POST | /v1/chat/completions | Chat completions (streaming and non-streaming) |
POST | /v1/completions | Text completions |
POST | /v1/embeddings | Generate embedding vectors |
POST | /v1/audio/transcriptions | Speech-to-text (Whisper) |
POST | /v1/audio/speech | Text-to-speech |
POST | /v1/images/generations | Image generation |
POST | /v1/files | Upload files |
GET | /v1/files | List uploaded files |
GET | /health | Health check |
GET | /version | Server version |
Endpoint reference
Chat completions
/v1/chat/completions and /v1/completions — multi-turn chat, vision input, tool calling, JSON mode.Embeddings
/v1/embeddings — generate dense embedding vectors for text inputs.Audio
/v1/audio/transcriptions and /v1/audio/speech — Whisper STT and TTS.Images
/v1/images/generations — generate images with sdxl_turbo, SD3, Flux, and more.Gradio client
Use the Gradio Python client to call h2oGPT APIs directly, including streaming and document Q&A.
h2oGPT-specific parameters
In addition to standard OpenAI parameters, h2oGPT accepts extended parameters viaextra_body. These map directly to h2oGPT’s internal evaluate() parameters, for example:
openai_server/server.py for the complete list of accepted fields.