Running H2O in Docker
Official image
The H2O Docker image is based on Ubuntu 24.04 and includes:- OpenJDK 8
- Python 3.11 with
h2o,scikit-learn,pandas,numpy, and related packages - The latest stable
h2o.jar
- 54321 — H2O Flow web UI and REST API
- 54322 — internal node-to-node communication
Quick start
http://localhost:54321 in your browser to access H2O Flow.
Building a custom image
Use the Dockerfile from the H2O repository as a starting point:- Installs Java 8 and Python 3.11 on Ubuntu 24.04
- Downloads the latest stable
h2o.jarfrom H2O’s S3 repository - Installs the H2O Python wheel
- Exposes ports 54321 and 54322
For reproducibility in production, pin to a specific H2O version by setting the image tag rather than using the
latest tag.Kubernetes deployment via Helm
Theh2o-helm chart deploys a multi-node H2O cluster as a Kubernetes StatefulSet. Nodes discover each other automatically through the Kubernetes DNS service.
Prerequisites
- Helm 3.x
- A Kubernetes cluster with sufficient CPU and memory resources
kubectlconfigured to point at your cluster
Installing the chart
Minimal values.yaml
values.yaml
Example with ingress
example.yaml
How cluster formation works in Kubernetes
H2O uses aStatefulSet with podManagementPolicy: Parallel so all pods start simultaneously. Each node discovers its peers via the Kubernetes headless service DNS name:
| Variable | Description |
|---|---|
H2O_KUBERNETES_SERVICE_DNS | Headless service DNS for peer discovery |
H2O_NODE_LOOKUP_TIMEOUT | Seconds to wait for expected peers (default 180) |
H2O_NODE_EXPECTED_COUNT | Total number of nodes to wait for |
H2O_KUBERNETES_API_PORT | Port for the Kubernetes readiness probe (default 8080) |
/kubernetes/isLeaderNode on port 8080 gates traffic until the leader node is elected and the cluster is fully formed.
Scaling nodes
To change the cluster size, update the Helm release:resources.cpu and resources.memory are set as both requests and limits — Kubernetes will always give H2O exactly the resources specified.
Accessing the H2O UI and API
By default the chart creates a ClusterIP service on port 80. To expose H2O externally: Load balancer (for cloud environments):Helm chart reference
The chart source lives ath2o-helm/ in the H2O-3 repository.
| File | Purpose |
|---|---|
Chart.yaml | Chart metadata and version |
values.yaml | Default configuration values |
templates/statefulset.yaml | H2O node StatefulSet definition |
templates/service.yaml | Headless service for peer discovery |
templates/ingress.yaml | Optional ingress resource |
templates/loadbalancer.yaml | Optional load balancer service |