Kubernetes Deployment

Kubernetes excels at managing containerized applications at scale, providing features like automatic scaling, rolling updates, and self-healing capabilities. When your Agent Mesh deployment needs to handle varying loads or requires high availability, Kubernetes becomes the preferred orchestration platform.

Prerequisites

Before deploying to Kubernetes, ensure you have: Cluster Requirements:

Kubernetes cluster version 1.20 or later
kubectl command-line tool configured
Helm 3.0 or later installed
Standard worker nodes (VMs or bare metal)
- Not supported: Serverless nodes (AWS Fargate, GKE Autopilot, Azure Virtual Nodes)

External Services:

PostgreSQL 17+ database (managed service recommended)
S3-compatible object storage
Solace event broker (Cloud or self-hosted)
LLM provider endpoints
Container registry credentials

Minimum Compute (per node):

2 vCPU / 8 GB RAM (minimum)
4 vCPU / 16 GB RAM (recommended)
SSD-backed storage class

Supported Kubernetes Distributions

Validated Platforms (Tier 1 Support)

Solace explicitly validates Agent Mesh releases against:

AWS EKS - Amazon Elastic Kubernetes Service
Azure AKS - Azure Kubernetes Service
Google GKE - Google Kubernetes Engine

Compatible Platforms (Tier 2 Support)

Agent Mesh is compatible with standard Kubernetes APIs:

Red Hat OpenShift
VMware Tanzu (TKG)
SUSE Rancher (RKE2)
Oracle Container Engine (OKE)
Canonical Charmed Kubernetes
Upstream Kubernetes (kubeadm)

For distributions with proprietary security constraints (e.g., OpenShift SCCs, Tanzu PSPs), Solace support is limited to API compatibility confirmation. Customer-specific security policies remain the customer’s responsibility.

Helm Chart Quickstart

The Solace Agent Mesh Helm quickstart provides pre-configured charts, deployment examples, and detailed documentation for common scenarios.

Installation

1. Clone the Helm quickstart repository:

git clone https://github.com/SolaceProducts/solace-agent-mesh-helm-quickstart.git
cd solace-agent-mesh-helm-quickstart

2. Review the documentation: For step-by-step deployment instructions, see the Helm Deployment Guide. 3. Configure your deployment: Create a values.yaml file with your environment-specific settings:

# Global settings
global:
  imageRegistry: gcr.io/gcp-maas-prod
  imagePullSecrets:
    - name: gcr-pull-secret
  namespace: solace-agent-mesh

# Solace Event Broker Connection
broker:
  url: wss://your-broker.messaging.solace.cloud:443
  username: your-username
  password: your-password
  vpn: your-vpn
  useTemporaryQueues: false

# LLM Configuration
llm:
  endpoint: https://api.openai.com/v1
  apiKey: sk-...
  planningModel: openai/gpt-4
  generalModel: openai/gpt-4

# Session Storage (PostgreSQL)
sessionStorage:
  type: sql
  databaseUrl: postgresql://user:[email protected]:5432/sam

# Artifact Storage (S3)
artifactStorage:
  type: s3
  bucket: your-bucket-name
  region: us-east-1
  accessKeyId: AKIA...
  secretAccessKey: your-secret-key

# Security
security:
  sessionSecretKey: your-random-secret-key

# Resource configuration
resources:
  agentMesh:
    requests:
      cpu: 175m
      memory: 625Mi
    limits:
      cpu: 200m
      memory: 1Gi
  deployer:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: 100m
      memory: 100Mi

# Health checks
healthCheck:
  enabled: true
  port: 8080

# Ingress
ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: agent-mesh.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: agent-mesh-tls
      hosts:
        - agent-mesh.example.com

4. Create Kubernetes secrets:

# Pull secret for container registry
kubectl create secret docker-registry gcr-pull-secret \
  --docker-server=gcr.io \
  --docker-username=_json_key \
  --docker-password="$(cat /path/to/key.json)" \
  -n solace-agent-mesh

# Secrets for sensitive configuration
kubectl create secret generic sam-secrets \
  --from-literal=broker-password=your-password \
  --from-literal=llm-api-key=sk-... \
  --from-literal=session-secret-key=your-secret \
  --from-literal=db-password=db-password \
  --from-literal=aws-access-key=AKIA... \
  --from-literal=aws-secret-key=your-secret \
  -n solace-agent-mesh

5. Install the Helm chart:

helm install solace-agent-mesh ./charts/solace-agent-mesh \
  -f values.yaml \
  -n solace-agent-mesh \
  --create-namespace

6. Verify the deployment:

# Check pod status
kubectl get pods -n solace-agent-mesh

# Check service endpoints
kubectl get svc -n solace-agent-mesh

# View logs
kubectl logs -f deployment/solace-agent-mesh -n solace-agent-mesh

Deployment Architecture

Monolithic Deployment

Deploy all components in a single deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: solace-agent-mesh
  namespace: solace-agent-mesh
spec:
  replicas: 2
  selector:
    matchLabels:
      app: solace-agent-mesh
  template:
    metadata:
      labels:
        app: solace-agent-mesh
    spec:
      containers:
        - name: agent-mesh
          image: solace/solace-agent-mesh:latest
          args: ["run", "--system-env"]
          ports:
            - containerPort: 5002
              name: web-ui
            - containerPort: 8000
              name: api
            - containerPort: 8080
              name: health
          envFrom:
            - secretRef:
                name: sam-secrets
            - configMapRef:
                name: sam-config
          resources:
            requests:
              cpu: 175m
              memory: 625Mi
            limits:
              cpu: 200m
              memory: 1Gi

Microservices Deployment

Deploy components as separate deployments for independent scaling: Core Platform:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sam-core
  namespace: solace-agent-mesh
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sam-core
      component: core
  template:
    metadata:
      labels:
        app: sam-core
        component: core
    spec:
      containers:
        - name: core
          image: solace/solace-agent-mesh:latest
          args: ["run", "--system-env", "/app/configs/core.yaml"]
          # ... ports, env, resources ...

Specialized Agent:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sam-database-agent
  namespace: solace-agent-mesh
spec:
  replicas: 3  # Scale independently
  selector:
    matchLabels:
      app: sam-agent
      component: database-agent
  template:
    metadata:
      labels:
        app: sam-agent
        component: database-agent
    spec:
      containers:
        - name: agent
          image: solace/solace-agent-mesh:latest
          args: ["run", "--system-env", "/app/configs/agents/database_agent.yaml"]
          resources:
            requests:
              cpu: 175m
              memory: 625Mi
            limits:
              cpu: 200m
              memory: 768Mi

Configuration Management

ConfigMap for Non-Sensitive Data

apiVersion: v1
kind: ConfigMap
metadata:
  name: sam-config
  namespace: solace-agent-mesh
data:
  SOLACE_BROKER_URL: "wss://your-broker.messaging.solace.cloud:443"
  SOLACE_BROKER_USERNAME: "your-username"
  SOLACE_BROKER_VPN: "your-vpn"
  USE_TEMPORARY_QUEUES: "false"
  LLM_SERVICE_ENDPOINT: "https://api.openai.com/v1"
  LLM_SERVICE_PLANNING_MODEL_NAME: "openai/gpt-4"
  LLM_SERVICE_GENERAL_MODEL_NAME: "openai/gpt-4"
  CONFIG_PORTAL_HOST: "0.0.0.0"
  FASTAPI_HOST: "0.0.0.0"
  FASTAPI_PORT: "8000"
  ARTIFACT_STORAGE_TYPE: "s3"
  ARTIFACT_STORAGE_S3_BUCKET: "your-bucket"
  ARTIFACT_STORAGE_S3_REGION: "us-east-1"

Secrets for Sensitive Data

apiVersion: v1
kind: Secret
metadata:
  name: sam-secrets
  namespace: solace-agent-mesh
type: Opaque
stringData:
  SOLACE_BROKER_PASSWORD: "your-password"
  LLM_SERVICE_API_KEY: "sk-..."
  SESSION_SECRET_KEY: "your-random-secret-key"
  DATABASE_URL: "postgresql://user:password@host:5432/sam"
  AWS_ACCESS_KEY_ID: "AKIA..."
  AWS_SECRET_ACCESS_KEY: "your-secret-key"

Never commit secrets directly in YAML files. Use sealed secrets, external secret operators, or create secrets via kubectl.

Health Checks and Probes

Configure Kubernetes probes for automated lifecycle management:

containers:
  - name: agent-mesh
    # ... other config ...
    ports:
      - containerPort: 8080
        name: health
    
    # Startup probe - prevents liveness from killing slow-starting containers
    startupProbe:
      httpGet:
        path: /startup
        port: health
      initialDelaySeconds: 5
      periodSeconds: 5
      failureThreshold: 30  # 150 seconds total (30 * 5s)
    
    # Readiness probe - removes pod from service when unhealthy
    readinessProbe:
      httpGet:
        path: /readyz
        port: health
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3
    
    # Liveness probe - restarts container when unhealthy
    livenessProbe:
      httpGet:
        path: /healthz
        port: health
      periodSeconds: 30
      timeoutSeconds: 10
      failureThreshold: 3

Storage Configuration

Persistent Volumes for Shared Storage

If using file-based artifact storage (not recommended for production):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: sam-artifacts
  namespace: solace-agent-mesh
spec:
  accessModes:
    - ReadWriteMany  # Required for multiple pods
  storageClassName: efs-sc  # AWS EFS, Azure Files, or similar
  resources:
    requests:
      storage: 50Gi

Using External Storage (Recommended)

For production, use managed services: PostgreSQL:

AWS RDS for PostgreSQL
Azure Database for PostgreSQL
Google Cloud SQL for PostgreSQL

Object Storage:

AWS S3
Azure Blob Storage
Google Cloud Storage
MinIO (self-hosted)

Queue Configuration

For Kubernetes environments with container restarts, configure durable queues:

envFrom:
  - configMapRef:
      name: sam-config
env:
  - name: USE_TEMPORARY_QUEUES
    value: "false"

Create a Queue Template in Solace Cloud:

Navigate to Message VPNs → select your VPN
Go to Queues → Templates tab
Click + Queue Template
Configure:
- Queue Name Filter: sam/> (or your namespace)
- Respect TTL: true
- Maximum TTL (sec): 18000

This prevents message accumulation when agents restart.

Resource Management

Resource Requests and Limits

Core Components:

resources:
  agentMesh:
    requests:
      cpu: 175m
      memory: 625Mi
    limits:
      cpu: 200m
      memory: 1Gi
  
  deployer:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: 100m
      memory: 100Mi
  
  agent:
    requests:
      cpu: 175m
      memory: 625Mi
    limits:
      cpu: 200m
      memory: 768Mi

Horizontal Pod Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: sam-agent-hpa
  namespace: solace-agent-mesh
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sam-database-agent
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Network Configuration

Service Definition

apiVersion: v1
kind: Service
metadata:
  name: solace-agent-mesh
  namespace: solace-agent-mesh
spec:
  type: ClusterIP
  selector:
    app: solace-agent-mesh
  ports:
    - name: web-ui
      port: 5002
      targetPort: 5002
    - name: api
      port: 8000
      targetPort: 8000

Ingress Configuration

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: solace-agent-mesh
  namespace: solace-agent-mesh
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - agent-mesh.example.com
      secretName: agent-mesh-tls
  rules:
    - host: agent-mesh.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: solace-agent-mesh
                port:
                  number: 5002

Security Considerations

Pod Security

spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 999
        fsGroup: 999
      containers:
        - name: agent-mesh
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
            readOnlyRootFilesystem: false  # Agent Mesh needs write access

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: sam-network-policy
  namespace: solace-agent-mesh
spec:
  podSelector:
    matchLabels:
      app: solace-agent-mesh
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - port: 5002
        - port: 8000
  egress:
    - to:  # Allow DNS
        - namespaceSelector:
            matchLabels:
              name: kube-system
      ports:
        - port: 53
          protocol: UDP
    - to:  # Allow Solace broker
        - podSelector: {}
      ports:
        - port: 443
        - port: 55443
    - to:  # Allow LLM providers
        - podSelector: {}
      ports:
        - port: 443

Monitoring and Observability

ServiceMonitor for Prometheus

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: solace-agent-mesh
  namespace: solace-agent-mesh
spec:
  selector:
    matchLabels:
      app: solace-agent-mesh
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

Logging with FluentBit

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
data:
  parsers.conf: |
    [PARSER]
        Name   sam-json
        Format json
        Time_Key timestamp
        Time_Format %Y-%m-%dT%H:%M:%S.%L
  
  fluent-bit.conf: |
    [INPUT]
        Name              tail
        Path              /var/log/containers/*solace-agent-mesh*.log
        Parser            sam-json
        Tag               sam.*
        Refresh_Interval  5

Troubleshooting

Pod Won’t Start

# Check pod status
kubectl get pods -n solace-agent-mesh

# Describe pod for events
kubectl describe pod <pod-name> -n solace-agent-mesh

# Check logs
kubectl logs <pod-name> -n solace-agent-mesh

# Check previous container logs (if restarted)
kubectl logs <pod-name> -n solace-agent-mesh --previous

Image Pull Errors

# Verify pull secret exists
kubectl get secrets -n solace-agent-mesh

# Check secret is referenced in service account
kubectl get serviceaccount default -n solace-agent-mesh -o yaml

Health Check Failures

# Check health endpoints directly
kubectl port-forward pod/<pod-name> 8080:8080 -n solace-agent-mesh
curl http://localhost:8080/healthz
curl http://localhost:8080/readyz

# Check probe configuration
kubectl get pod <pod-name> -n solace-agent-mesh -o yaml | grep -A 10 livenessProbe

Connection Issues

# Test from inside pod
kubectl exec -it <pod-name> -n solace-agent-mesh -- /bin/bash
curl -v https://your-broker.messaging.solace.cloud
curl -v https://api.openai.com/v1/models

Next Steps

Production Best Practices

Security, monitoring, and operational best practices

Helm Quickstart Guide

Detailed Helm chart documentation and examples

Deployment Overview - Compare deployment options
Health Checks - Configure health monitoring
Logging - Application logging configuration
Queue Configuration - Durable queue setup

Getting Started

Installation & Configuration

Core Concepts

Components

Built-in Tools

Developer Guides

Deployment

Enterprise Features

​Kubernetes Deployment

​Prerequisites

​Supported Kubernetes Distributions

​Validated Platforms (Tier 1 Support)

​Compatible Platforms (Tier 2 Support)

​Helm Chart Quickstart

​Installation

​Deployment Architecture

​Monolithic Deployment

​Microservices Deployment

​Configuration Management

​ConfigMap for Non-Sensitive Data

​Secrets for Sensitive Data

​Health Checks and Probes

​Storage Configuration

​Persistent Volumes for Shared Storage

​Using External Storage (Recommended)

​Queue Configuration

​Resource Management

​Resource Requests and Limits

​Horizontal Pod Autoscaling

​Network Configuration

​Service Definition

​Ingress Configuration

​Security Considerations

​Pod Security

​Network Policies

​Monitoring and Observability

​ServiceMonitor for Prometheus

​Logging with FluentBit

​Troubleshooting

​Pod Won’t Start

​Image Pull Errors

​Health Check Failures

​Connection Issues

​Next Steps

Production Best Practices

Helm Quickstart Guide

​Related Documentation

Build docs developers (and LLMs) love

Kubernetes Deployment

Prerequisites

Supported Kubernetes Distributions

Validated Platforms (Tier 1 Support)

Compatible Platforms (Tier 2 Support)

Helm Chart Quickstart

Installation

Deployment Architecture

Monolithic Deployment

Microservices Deployment

Configuration Management

ConfigMap for Non-Sensitive Data

Secrets for Sensitive Data

Health Checks and Probes

Storage Configuration

Persistent Volumes for Shared Storage

Using External Storage (Recommended)

Queue Configuration

Resource Management

Resource Requests and Limits

Horizontal Pod Autoscaling

Network Configuration

Service Definition

Ingress Configuration

Security Considerations

Pod Security

Network Policies

Monitoring and Observability

ServiceMonitor for Prometheus

Logging with FluentBit

Troubleshooting

Pod Won’t Start

Image Pull Errors

Health Check Failures

Connection Issues

Next Steps

Related Documentation