Skip to main content

Overview

The certificate manager is a combined nginx reverse proxy and TLS certificate automation service. It handles TLS termination, certificate provisioning via Let’s Encrypt, automatic renewal, and EKM extraction for TLS channel binding.

Architecture

Technology Stack

  • Reverse Proxy: Nginx with custom EKM module
  • Certificate Automation: Python 3.10+ with Certbot 5.0
  • Process Management: Supervisord for nginx + cert manager
  • TEE Integration: dstack_sdk 0.5.3 for key derivation
  • Dependencies: cryptography, schedule, certbot

Service Components

The cert-manager container runs two processes:
  1. Nginx: Reverse proxy with TLS termination and EKM extraction
  2. Cert Manager: Python service for certificate lifecycle management
┌───────────────────────────────────────────────────────┐
│         nginx-cert-manager Container                  │
│                                                       │
│  ┌─────────────────┐    ┌──────────────────────┐    │
│  │   Supervisord   │    │                      │    │
│  └────────┬────────┘    │                      │    │
│           │             │                      │    │
│     ┌─────┴──────┐      │   Shared Volumes:   │    │
│     │            │      │                      │    │
│  ┌──▼─────┐  ┌──▼─────┐│  - /etc/nginx/ssl/  │    │
│  │ Nginx  │  │  Cert  ││  - /acme-challenge/ │    │
│  │        │  │Manager ││                      │    │
│  └────────┘  └────────┘│                      │    │
│                         └──────────────────────┘    │
└───────────────────────────────────────────────────────┘

Service Configuration

nginx-cert-manager:
  image: ghcr.io/concrete-security/cert-manager
  container_name: nginx-cert-manager
  ports:
    - "80:80"    # HTTP (ACME challenges + redirect)
    - "443:443"  # HTTPS (TLS termination)
  environment:
    - DOMAIN=vllm.concrete-security.com
    - DEV_MODE=false
    - LETSENCRYPT_STAGING=false
    - LETSENCRYPT_ACCOUNT_VERSION=v1
    - FORCE_RM_CERT_FILES=false
    - LOG_LEVEL=INFO
  volumes:
    - tls-certs-keys:/etc/nginx/ssl/
    - /var/run/dstack.sock:/var/run/dstack.sock
  networks:
    - vllm
    - attestation
    - auth

Nginx Reverse Proxy

TLS Configuration

Nginx is configured for TLS 1.3 only (required for RFC 9266 EKM):
server {
    listen 443 ssl http2;
    server_name vllm.concrete-security.com;

    # TLS 1.3 required for EKM channel binding (RFC 9266)
    ssl_protocols TLSv1.3;

    # Enable connection reuse for attestation + inference
    keepalive_timeout 60;
    keepalive_requests 100;

    # Certificate files managed by cert-manager
    ssl_certificate /etc/nginx/ssl/cert.pem;
    ssl_certificate_key /etc/nginx/ssl/key.pem;
    
    # ... location blocks ...
}
From nginx_conf/https.conf:1-18

EKM Extraction Module

The nginx build includes a custom module (ngx_http_ekm_module) that:
  1. Extracts TLS EKM from the connection
  2. Derives HMAC key from dstack
  3. Signs EKM with HMAC-SHA256(ekm, key)
  4. Forwards as X-TLS-EKM-Channel-Binding header
The signed header is forwarded to the attestation service:
location = /tdx_quote {
    proxy_pass http://attestation-service:8080;
    # Forward TLS EKM for session binding (RFC 9266)
    proxy_set_header X-TLS-EKM-Channel-Binding $ekm_channel_binding;
}
From nginx_conf/https.conf:100-107

Service Routing

Nginx routes traffic to internal services:
PathBackendDescription
/healthStaticHealth check endpoint
/_authauth-service:8081Internal auth subrequest
/metricsvllm:8000Metrics (requires auth)
/tdx_quoteattestation-service:8080TDX attestation
/vllm:8000AI inference API

CORS Configuration

Allows requests from Concrete Security and Vercel domains:
set $cors_origin "";
if ($http_origin ~ '^https?://([^.]+\.)*concrete-security\.com(:[0-9]+)?$') {
    set $cors_origin $http_origin;
}
if ($http_origin ~ '^https?://([^.]+\.)*vercel\.app(:[0-9]+)?$') {
    set $cors_origin $http_origin;
}

add_header 'Access-Control-Allow-Origin' $cors_origin always;
From nginx_conf/https.conf:23-29

HTTP to HTTPS Redirect

All HTTP traffic redirects to HTTPS except ACME challenges:
server {
    listen 80;
    
    # ACME HTTP-01 challenge for Let's Encrypt
    location /.well-known/acme-challenge/ {
        root /acme-challenge/;
    }
    
    # Redirect all other HTTP to HTTPS
    location / {
        return 301 https://$host$request_uri;
    }
}

Certificate Management

Lifecycle Overview

The certificate manager handles the full certificate lifecycle:
┌──────────────┐
│   Startup    │
└──────┬───────┘


┌──────────────────┐      ┌─────────────────┐
│ Check existing   │──No──>│ Generate new    │
│ certificate      │      │ certificate     │
└──────┬───────────┘      └────────┬────────┘
       │ Yes                       │
       ▼                           ▼
┌──────────────────┐      ┌─────────────────┐
│ Valid & not      │──No──>│ Renew           │
│ expiring soon?   │      │ certificate     │
└──────┬───────────┘      └────────┬────────┘
       │ Yes                       │
       ▼                           ▼
┌──────────────────┐      ┌─────────────────┐
│ Emit event to    │      │ Save cert + key │
│ RTMR3            │<─────│                 │
└──────┬───────────┘      └─────────────────┘


┌──────────────────┐
│ Configure nginx  │
│ with HTTPS       │
└──────┬───────────┘


┌──────────────────┐
│ Schedule daily   │
│ renewal check    │
└──────────────────┘

Certificate Validation

The manager checks certificate validity before renewal:
def is_cert_valid(self) -> bool:
    """Check if current certificate is valid."""
    cert_file = self.cert_path / self.CERT_FILENAME
    key_file = self.cert_path / self.KEY_FILENAME
    
    if not cert_file.exists() or not key_file.exists():
        logger.info("Certificate or key files not found")
        return False
    
    with open(cert_file, "rb") as f:
        certs = x509.load_pem_x509_certificates(f.read())
    
    if not certs:
        return False
    
    # Check the first certificate (leaf) for expiry
    leaf_cert = certs[0]
    expiry_threshold = datetime.now(timezone.utc) + timedelta(
        days=self.CERT_EXPIRY_THRESHOLD_DAYS  # 30 days
    )
    if leaf_cert.not_valid_after_utc < expiry_threshold:
        logger.info(
            f"Certificate expires on {leaf_cert.not_valid_after_utc}, "
            f"renewal needed"
        )
        return False
    
    logger.info(f"Certificate valid until {leaf_cert.not_valid_after_utc}")
    return True
From cmgr.py:278-310 Certificates are renewed when they expire within 30 days.

Deterministic Key Generation

Private keys are derived deterministically from the TEE:
def generate_deterministic_key(self, key_path: str) -> ec.EllipticCurvePrivateKey:
    """Generate deterministic EC key using dstack SDK."""
    # Get deterministic key material
    key_material = self.get_deterministic_key_material(key_path)
    
    # Derive EC key from the material
    return self.derive_ec_privatekey_from_key_material(key_material)

def get_deterministic_key_material(self, key_path: str) -> bytes:
    """Get deterministic key material using Phala dstack SDK.
    
    Same compose hash + path will always yield the same key.
    """
    if self.dev_mode:
        logger.warning(
            "Dev mode active: using fixed key material. "
            "Don't do this for production!"
        )
        return b"\x01" * 32
    
    # Initialize dstack client
    dstack_client = DstackClient()
    # Use dstack SDK to get deterministic 32-byte key material
    result = dstack_client.get_key(f"{key_path}")
    key_material = result.decode_key()  # 32 bytes from dstack
    logger.info(
        f"Retrieved deterministic key material from dstack "
        f"for path: {key_path}"
    )
    return key_material
From cmgr.py:68-99 Key Derivation Paths:
  • Production: cert/letsencrypt/{domain}/v1
  • Development: cert/debug/{domain}/v1
  • Account Key: letsencrypt-account/{domain}/{account_version}

Let’s Encrypt Integration

The manager uses Certbot for Let’s Encrypt certificates:
def create_lets_encrypt_cert(
    self, private_key: ec.EllipticCurvePrivateKey
) -> List[x509.Certificate]:
    """Create Let's Encrypt certificate using certbot."""
    logger.info("Creating Let's Encrypt certificate using certbot")
    
    # Generate account key (deterministic)
    account_key = self.generate_deterministic_key(
        f"letsencrypt-account/{self.domain}/{self.letsencrypt_account_version}"
    )
    
    # Create Certificate Signing Request with our deterministic private key
    csr = (
        x509.CertificateSigningRequestBuilder()
        .subject_name(
            x509.Name([
                x509.NameAttribute(NameOID.COMMON_NAME, self.domain),
            ])
        )
        .add_extension(
            x509.SubjectAlternativeName([
                x509.DNSName(self.domain),
            ]),
            critical=False,
        )
        .sign(private_key, hashes.SHA256())
    )
    
    # Serialize CSR and account key to PEM format
    csr_pem = csr.public_bytes(Encoding.PEM)
    account_key_pem = account_key.private_bytes(
        encoding=Encoding.PEM,
        format=PrivateFormat.PKCS8,
        encryption_algorithm=NoEncryption(),
    )
    
    # Initialize certbot wrapper
    certbot = CertbotWrapper(staging=self.letsencrypt_staging)
    
    # Use certbot to obtain certificate with our CSR
    fullchain_pem = certbot.obtain_certificate_with_csr(
        email=self.cert_email,
        webroot_path=str(self.acme_path),
        csr_pem=csr_pem,
        account_key_pem=account_key_pem,
    )
    
    # Load certificate chain from PEM
    certs = x509.load_pem_x509_certificates(fullchain_pem)
    
    logger.info(
        f"Successfully obtained Let's Encrypt certificate chain "
        f"using certbot ({len(certs)} certificates)"
    )
    return certs
From cmgr.py:128-193 ACME HTTP-01 Challenge:
  1. Certbot requests challenge from Let’s Encrypt
  2. Challenge file written to /acme-challenge/.well-known/acme-challenge/
  3. Nginx serves challenge file on port 80
  4. Let’s Encrypt validates domain ownership
  5. Certificate issued and returned as PEM

Self-Signed Certificates (Development)

In development mode, the manager generates self-signed certificates:
def create_self_signed_cert(
    self, private_key: ec.EllipticCurvePrivateKey
) -> List[x509.Certificate]:
    """Create self-signed certificate for development."""
    logger.info("Creating self-signed certificate for development")
    
    subject = issuer = x509.Name([
        x509.NameAttribute(NameOID.ORGANIZATION_NAME, "Concrete Security"),
        x509.NameAttribute(NameOID.COMMON_NAME, self.domain),
    ])
    
    cert = (
        x509.CertificateBuilder()
        .subject_name(subject)
        .issuer_name(issuer)
        .public_key(private_key.public_key())
        .serial_number(x509.random_serial_number())
        .not_valid_before(datetime.now(timezone.utc))
        .not_valid_after(datetime.now(timezone.utc) + timedelta(days=365))
        .add_extension(
            x509.SubjectAlternativeName([
                x509.DNSName(self.domain),
                x509.DNSName(f"*.{self.domain}"),
                x509.DNSName("localhost"),
            ]),
            critical=False,
        )
        .sign(private_key, hashes.SHA256())
    )
    return [cert]
From cmgr.py:195-232

Certificate Event Emission

New certificates are logged to the TEE’s RTMR3 register:
def emit_new_cert_event(self):
    """Emit new cert event in RTMR3."""
    cert_file = self.cert_path / self.CERT_FILENAME
    with open(cert_file, "rb") as f:
        certs = x509.load_pem_x509_certificates(f.read())
    
    leaf_cert = certs[0]
    cert_der = leaf_cert.public_bytes(Encoding.DER)
    cert_hash = sha256(cert_der).hexdigest()
    
    if self.dev_mode:
        logger.info(f"New TLS Certificate: {cert_hash}")
    else:
        dstack_client = DstackClient()
        dstack_client.emit_event("New TLS Certificate", cert_hash)
        logger.info("Emitted new TLS certificate event to Dstack")
From cmgr.py:427-450 This extends RTMR3 with the certificate hash, making it part of the attestable state.

Renewal Scheduling

The manager checks for renewal daily at midnight:
def run(self):
    """Main run loop."""
    # Initial setup
    self.startup_init()
    
    # Schedule periodic cert management (everyday at midnight)
    schedule.every().day.at("00:00").do(self.manage_cert_creation_and_renewal)
    
    # Main loop
    logger.info("Certificate manager running, checking for renewal every day")
    while True:
        schedule.run_pending()
        time.sleep(3600 * 6)  # Check every 6 hours
From cmgr.py:565-586

Nginx Configuration Management

The cert manager dynamically configures nginx:

HTTP-Only Mode (Initial)

Before certificates are available, nginx runs in HTTP-only mode:
server {
    listen 80;
    
    location /.well-known/acme-challenge/ {
        root /acme-challenge/;
    }
    
    location / {
        return 503 "Service starting, certificate being provisioned";
    }
}

HTTPS Mode (After Cert Provisioning)

Once certificates are ready, nginx is reconfigured with HTTPS:
def setup_nginx_https_config(self):
    """Setup nginx HTTPS configuration and reload."""
    # Copy HTTPS configuration
    if self.dev_mode:
        shutil.copy(
            "/etc/nginx/conf.d/https-dev.conf",
            "/etc/nginx/conf.d/https.conf"
        )
    else:
        shutil.copy(
            "/etc/nginx/conf.d/https-prod.conf",
            "/etc/nginx/conf.d/https.conf"
        )
    
    # Reload nginx to pick up new config
    subprocess.run(["nginx", "-s", "reload"], check=True)
    logger.info("Nginx reloaded with HTTPS configuration")
From supervisor.py

Development vs Production

Development Mode

Enabled with DEV_MODE=true:
environment:
  - DEV_MODE=true
Behavior:
  • Self-signed certificates
  • Fixed deterministic keys (not from dstack)
  • Certificate hash logged (not emitted to RTMR3)
  • Extended logging
  • Wildcard DNS support (*.localhost)

Production Mode

Default with DEV_MODE=false:
environment:
  - DEV_MODE=false
  - DOMAIN=vllm.concrete-security.com
  - LETSENCRYPT_STAGING=false
Behavior:
  • Let’s Encrypt certificates
  • Keys derived from dstack
  • Certificate hash emitted to RTMR3
  • Production logging
  • Domain-specific certificates

Staging Mode

For testing Let’s Encrypt without rate limits:
environment:
  - DEV_MODE=false
  - LETSENCRYPT_STAGING=true
Let’s Encrypt staging environment issues certificates from a test CA. These certificates won’t be trusted by browsers but are useful for testing the ACME flow.

Environment Variables

VariableDefaultDescription
DOMAINlocalhostDomain name for certificates
DEV_MODEfalseEnable development mode (self-signed certs)
LETSENCRYPT_STAGINGfalseUse Let’s Encrypt staging environment
LETSENCRYPT_ACCOUNT_VERSIONv1Account identifier for rate limit management
EMAIL[email protected]Email for Let’s Encrypt account
FORCE_RM_CERT_FILESfalseForce certificate regeneration on startup
LOG_LEVELINFOLogging verbosity

Supervisord Configuration

Supervisord manages both nginx and the cert manager:
[supervisord]
nodaemon=true

[program:cert-manager]
command=python -m cert_manager.main
autostart=true
autorestart=true
stdout_logfile=/dev/stdout
stderr_logfile=/dev/stderr

[program:nginx]
command=nginx -g 'daemon off;'
autostart=true
autorestart=true
stdout_logfile=/dev/stdout
stderr_logfile=/dev/stderr

Docker Image

Dockerfile Structure

FROM nginx:alpine

# Install Python and dependencies
RUN apk add --no-cache python3 py3-pip supervisor

# Copy nginx with custom EKM module
COPY nginx-with-ekm /usr/sbin/nginx

# Copy nginx configurations
COPY nginx_conf/ /etc/nginx/

# Install cert-manager Python package
COPY pyproject.toml .
COPY src/ ./src/
RUN pip install .

# Copy supervisord config
COPY supervisord.conf /etc/supervisord.conf

EXPOSE 80 443

CMD ["supervisord", "-c", "/etc/nginx/supervisord.conf"]

Published Images

Images are published to GitHub Container Registry:
image: ghcr.io/concrete-security/cert-manager@sha256:c9df1c64...

Testing

Local Development

cd cert-manager

# Install dependencies
uv sync

# Run cert manager only (no nginx)
export DEV_MODE=true
export DOMAIN=localhost
uv run python -m cert_manager.main

Integration Tests

# From CVM root
make dev-up
make test-certificate  # Test SSL certificate validation
make test-acme         # Test ACME challenge endpoint
make test-redirect     # Test HTTP to HTTPS redirect

Troubleshooting

Certificate Not Generated

Symptoms: Service starts but no HTTPS endpoint Debugging:
# Check cert manager logs
docker logs nginx-cert-manager | grep cert-manager

# Verify ACME challenge is accessible
curl http://your-domain.com/.well-known/acme-challenge/test

# Check dstack socket
docker exec nginx-cert-manager ls -la /var/run/dstack.sock

Rate Limits

Let’s Encrypt has rate limits:
  • 50 certificates per registered domain per week
  • 5 failed validations per account per hour
Solution: Use LETSENCRYPT_STAGING=true for testing or increment LETSENCRYPT_ACCOUNT_VERSION.

Certificate Renewal Failures

Check:
  1. ACME challenge endpoint accessible on port 80
  2. Domain DNS points to correct IP
  3. Firewall allows inbound port 80
  4. No rate limit errors in logs

Security Considerations

Private Key Protection

  • Keys derived from dstack never leave TEE
  • Keys stored in memory and on encrypted volume only
  • Deterministic derivation ensures consistency

Certificate Pinning

The RTMR3 event emission enables:
  • Attestation of current TLS certificate
  • Detection of certificate changes
  • Binding TLS to TEE identity

TLS 1.3 Only

Enforcing TLS 1.3 provides:
  • Forward secrecy
  • EKM support (RFC 9266)
  • Reduced attack surface

Monitoring

Key metrics to track:
  • Certificate expiry date
  • Renewal success/failure
  • ACME challenge success rate
  • Nginx reload events
  • dstack connection status

Next Steps

Deployment Guide

Deploy CVM services to Phala Cloud

Attestation Service

Understand TDX attestation integration

Build docs developers (and LLMs) love