Security model summary
H2O’s security model rests on a few key assumptions:- H2O runs inside a secure network perimeter. It is not designed to withstand internet-facing denial-of-service attacks.
- HTTP traffic between clients and H2O should be encrypted via HTTPS (both interactive and programmatic sessions).
- Once a user is authenticated, they have full access — H2O supports authentication but not fine-grained authorization or ACLs.
- Each user starts their own H2O cluster. H2O clusters are not intended to be shared among multiple users.
- All data is in-memory; restarting a cluster wipes all data from memory with nothing left on disk to clean.
Network ports
| Port | Protocol | Purpose |
|---|---|---|
| 54321 | TCP | H2O Flow web UI and REST API (client-facing) |
| 54322 | TCP + UDP | Internal node-to-node communication |
HTTPS / TLS
HTTPS encrypts traffic between clients (browser, R, Python) and H2O’s embedded web port. It uses a Java Keystore (JKS) backed by Jetty 9 and the Java runtime.Creating a self-signed keystore
Enabling HTTPS
| Flag | Description |
|---|---|
-jks <filename> | Path to Java keystore file |
-jks_pass <password> | Keystore password (default: h2oh2o) |
-jks_alias <alias> | Optional — which certificate from the keystore to use |
Connecting clients over HTTPS
HTTPS and authentication can be enabled independently or together. You can add authentication without HTTPS (though this is not recommended) or use HTTPS without authentication.
Authentication options
H2O supports four authentication mechanisms, all surfaced to the user as HTTP Basic Auth for client connections.Basic auth (username/password hash file)
The simplest option. A flat file maps usernames to hashed passwords.users.properties (uses Jetty’s HashLoginService format):
users.properties
LDAP authentication
Connect H2O’s authentication to an existing LDAP directory server.ldap.conf (Jetty LDAP LoginModule):
ldap.conf
Kerberos authentication
Kerberos integrates with H2O on Hadoop via kinit. Because H2O runs as a standard Hadoop MapReduce job, it inherits the submitting user’s Kerberos credentials with no H2O-specific code changes. For standalone H2O with Kerberos-based HTTP Basic Auth:kerb.conf:
kerb.conf
PAM authentication
PAM (Pluggable Authentication Modules) delegates to the operating system’s authentication stack, enabling integration with system accounts, SSSD, or any PAM-backed directory.Connecting with credentials
Once authentication is enabled, all clients must supply a username and password:Enforcing security settings in Hadoop deployments
System administrators can prevent users from starting H2O without required security flags by creating an implicit arguments file at/etc/h2o/h2odriver.args. Each argument must be on its own line:
/etc/h2o/h2odriver.args
Data security on Hadoop
H2O on Hadoop inherits the HDFS file permissions of the user who launched it:- Data lives in HDFS with standard file permissions.
- The user runs
hadoop jar h2odriver.jar— the job runs under that user’s identity. - H2O can only access HDFS files the user is permitted to read.
- Only the user who started the cluster is authenticated to access it.
- Kerberos (
kinit) works seamlessly — no H2O-specific configuration needed beyond passing-principaland-keytabflags.
Access control summary
What H2O does secure
What H2O does secure
- HDFS file access via OS and HDFS permissions
- Embedded web port (54321) via HTTPS and/or HTTP Basic Auth
- Internal node-to-node traffic via optional TLS (configured through JKS)
- Kerberos identity propagation on Hadoop
What H2O does not secure
What H2O does not secure
- Fine-grained authorization or ACLs — authentication grants full cluster access
- Protection against denial-of-service attacks
- Certificate validation on the R/Python client side (client-side cert checking is not yet implemented)