Skip to main content

Kubernetes Deployment

Kubernetes excels at managing containerized applications at scale, providing features like automatic scaling, rolling updates, and self-healing capabilities. When your Agent Mesh deployment needs to handle varying loads or requires high availability, Kubernetes becomes the preferred orchestration platform.

Prerequisites

Before deploying to Kubernetes, ensure you have: Cluster Requirements:
  • Kubernetes cluster version 1.20 or later
  • kubectl command-line tool configured
  • Helm 3.0 or later installed
  • Standard worker nodes (VMs or bare metal)
    • Not supported: Serverless nodes (AWS Fargate, GKE Autopilot, Azure Virtual Nodes)
External Services:
  • PostgreSQL 17+ database (managed service recommended)
  • S3-compatible object storage
  • Solace event broker (Cloud or self-hosted)
  • LLM provider endpoints
  • Container registry credentials
Minimum Compute (per node):
  • 2 vCPU / 8 GB RAM (minimum)
  • 4 vCPU / 16 GB RAM (recommended)
  • SSD-backed storage class

Supported Kubernetes Distributions

Validated Platforms (Tier 1 Support)

Solace explicitly validates Agent Mesh releases against:
  • AWS EKS - Amazon Elastic Kubernetes Service
  • Azure AKS - Azure Kubernetes Service
  • Google GKE - Google Kubernetes Engine

Compatible Platforms (Tier 2 Support)

Agent Mesh is compatible with standard Kubernetes APIs:
  • Red Hat OpenShift
  • VMware Tanzu (TKG)
  • SUSE Rancher (RKE2)
  • Oracle Container Engine (OKE)
  • Canonical Charmed Kubernetes
  • Upstream Kubernetes (kubeadm)
For distributions with proprietary security constraints (e.g., OpenShift SCCs, Tanzu PSPs), Solace support is limited to API compatibility confirmation. Customer-specific security policies remain the customer’s responsibility.

Helm Chart Quickstart

The Solace Agent Mesh Helm quickstart provides pre-configured charts, deployment examples, and detailed documentation for common scenarios.

Installation

1. Clone the Helm quickstart repository:
git clone https://github.com/SolaceProducts/solace-agent-mesh-helm-quickstart.git
cd solace-agent-mesh-helm-quickstart
2. Review the documentation: For step-by-step deployment instructions, see the Helm Deployment Guide. 3. Configure your deployment: Create a values.yaml file with your environment-specific settings:
# Global settings
global:
  imageRegistry: gcr.io/gcp-maas-prod
  imagePullSecrets:
    - name: gcr-pull-secret
  namespace: solace-agent-mesh

# Solace Event Broker Connection
broker:
  url: wss://your-broker.messaging.solace.cloud:443
  username: your-username
  password: your-password
  vpn: your-vpn
  useTemporaryQueues: false

# LLM Configuration
llm:
  endpoint: https://api.openai.com/v1
  apiKey: sk-...
  planningModel: openai/gpt-4
  generalModel: openai/gpt-4

# Session Storage (PostgreSQL)
sessionStorage:
  type: sql
  databaseUrl: postgresql://user:[email protected]:5432/sam

# Artifact Storage (S3)
artifactStorage:
  type: s3
  bucket: your-bucket-name
  region: us-east-1
  accessKeyId: AKIA...
  secretAccessKey: your-secret-key

# Security
security:
  sessionSecretKey: your-random-secret-key

# Resource configuration
resources:
  agentMesh:
    requests:
      cpu: 175m
      memory: 625Mi
    limits:
      cpu: 200m
      memory: 1Gi
  deployer:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: 100m
      memory: 100Mi

# Health checks
healthCheck:
  enabled: true
  port: 8080

# Ingress
ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: agent-mesh.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: agent-mesh-tls
      hosts:
        - agent-mesh.example.com
4. Create Kubernetes secrets:
# Pull secret for container registry
kubectl create secret docker-registry gcr-pull-secret \
  --docker-server=gcr.io \
  --docker-username=_json_key \
  --docker-password="$(cat /path/to/key.json)" \
  -n solace-agent-mesh

# Secrets for sensitive configuration
kubectl create secret generic sam-secrets \
  --from-literal=broker-password=your-password \
  --from-literal=llm-api-key=sk-... \
  --from-literal=session-secret-key=your-secret \
  --from-literal=db-password=db-password \
  --from-literal=aws-access-key=AKIA... \
  --from-literal=aws-secret-key=your-secret \
  -n solace-agent-mesh
5. Install the Helm chart:
helm install solace-agent-mesh ./charts/solace-agent-mesh \
  -f values.yaml \
  -n solace-agent-mesh \
  --create-namespace
6. Verify the deployment:
# Check pod status
kubectl get pods -n solace-agent-mesh

# Check service endpoints
kubectl get svc -n solace-agent-mesh

# View logs
kubectl logs -f deployment/solace-agent-mesh -n solace-agent-mesh

Deployment Architecture

Monolithic Deployment

Deploy all components in a single deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: solace-agent-mesh
  namespace: solace-agent-mesh
spec:
  replicas: 2
  selector:
    matchLabels:
      app: solace-agent-mesh
  template:
    metadata:
      labels:
        app: solace-agent-mesh
    spec:
      containers:
        - name: agent-mesh
          image: solace/solace-agent-mesh:latest
          args: ["run", "--system-env"]
          ports:
            - containerPort: 5002
              name: web-ui
            - containerPort: 8000
              name: api
            - containerPort: 8080
              name: health
          envFrom:
            - secretRef:
                name: sam-secrets
            - configMapRef:
                name: sam-config
          resources:
            requests:
              cpu: 175m
              memory: 625Mi
            limits:
              cpu: 200m
              memory: 1Gi

Microservices Deployment

Deploy components as separate deployments for independent scaling: Core Platform:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sam-core
  namespace: solace-agent-mesh
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sam-core
      component: core
  template:
    metadata:
      labels:
        app: sam-core
        component: core
    spec:
      containers:
        - name: core
          image: solace/solace-agent-mesh:latest
          args: ["run", "--system-env", "/app/configs/core.yaml"]
          # ... ports, env, resources ...
Specialized Agent:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sam-database-agent
  namespace: solace-agent-mesh
spec:
  replicas: 3  # Scale independently
  selector:
    matchLabels:
      app: sam-agent
      component: database-agent
  template:
    metadata:
      labels:
        app: sam-agent
        component: database-agent
    spec:
      containers:
        - name: agent
          image: solace/solace-agent-mesh:latest
          args: ["run", "--system-env", "/app/configs/agents/database_agent.yaml"]
          resources:
            requests:
              cpu: 175m
              memory: 625Mi
            limits:
              cpu: 200m
              memory: 768Mi

Configuration Management

ConfigMap for Non-Sensitive Data

apiVersion: v1
kind: ConfigMap
metadata:
  name: sam-config
  namespace: solace-agent-mesh
data:
  SOLACE_BROKER_URL: "wss://your-broker.messaging.solace.cloud:443"
  SOLACE_BROKER_USERNAME: "your-username"
  SOLACE_BROKER_VPN: "your-vpn"
  USE_TEMPORARY_QUEUES: "false"
  LLM_SERVICE_ENDPOINT: "https://api.openai.com/v1"
  LLM_SERVICE_PLANNING_MODEL_NAME: "openai/gpt-4"
  LLM_SERVICE_GENERAL_MODEL_NAME: "openai/gpt-4"
  CONFIG_PORTAL_HOST: "0.0.0.0"
  FASTAPI_HOST: "0.0.0.0"
  FASTAPI_PORT: "8000"
  ARTIFACT_STORAGE_TYPE: "s3"
  ARTIFACT_STORAGE_S3_BUCKET: "your-bucket"
  ARTIFACT_STORAGE_S3_REGION: "us-east-1"

Secrets for Sensitive Data

apiVersion: v1
kind: Secret
metadata:
  name: sam-secrets
  namespace: solace-agent-mesh
type: Opaque
stringData:
  SOLACE_BROKER_PASSWORD: "your-password"
  LLM_SERVICE_API_KEY: "sk-..."
  SESSION_SECRET_KEY: "your-random-secret-key"
  DATABASE_URL: "postgresql://user:password@host:5432/sam"
  AWS_ACCESS_KEY_ID: "AKIA..."
  AWS_SECRET_ACCESS_KEY: "your-secret-key"
Never commit secrets directly in YAML files. Use sealed secrets, external secret operators, or create secrets via kubectl.

Health Checks and Probes

Configure Kubernetes probes for automated lifecycle management:
containers:
  - name: agent-mesh
    # ... other config ...
    ports:
      - containerPort: 8080
        name: health
    
    # Startup probe - prevents liveness from killing slow-starting containers
    startupProbe:
      httpGet:
        path: /startup
        port: health
      initialDelaySeconds: 5
      periodSeconds: 5
      failureThreshold: 30  # 150 seconds total (30 * 5s)
    
    # Readiness probe - removes pod from service when unhealthy
    readinessProbe:
      httpGet:
        path: /readyz
        port: health
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3
    
    # Liveness probe - restarts container when unhealthy
    livenessProbe:
      httpGet:
        path: /healthz
        port: health
      periodSeconds: 30
      timeoutSeconds: 10
      failureThreshold: 3

Storage Configuration

Persistent Volumes for Shared Storage

If using file-based artifact storage (not recommended for production):
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: sam-artifacts
  namespace: solace-agent-mesh
spec:
  accessModes:
    - ReadWriteMany  # Required for multiple pods
  storageClassName: efs-sc  # AWS EFS, Azure Files, or similar
  resources:
    requests:
      storage: 50Gi
For production, use managed services: PostgreSQL:
  • AWS RDS for PostgreSQL
  • Azure Database for PostgreSQL
  • Google Cloud SQL for PostgreSQL
Object Storage:
  • AWS S3
  • Azure Blob Storage
  • Google Cloud Storage
  • MinIO (self-hosted)

Queue Configuration

For Kubernetes environments with container restarts, configure durable queues:
envFrom:
  - configMapRef:
      name: sam-config
env:
  - name: USE_TEMPORARY_QUEUES
    value: "false"
Create a Queue Template in Solace Cloud:
  1. Navigate to Message VPNs → select your VPN
  2. Go to QueuesTemplates tab
  3. Click + Queue Template
  4. Configure:
    • Queue Name Filter: sam/> (or your namespace)
    • Respect TTL: true
    • Maximum TTL (sec): 18000
This prevents message accumulation when agents restart.

Resource Management

Resource Requests and Limits

Core Components:
resources:
  agentMesh:
    requests:
      cpu: 175m
      memory: 625Mi
    limits:
      cpu: 200m
      memory: 1Gi
  
  deployer:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: 100m
      memory: 100Mi
  
  agent:
    requests:
      cpu: 175m
      memory: 625Mi
    limits:
      cpu: 200m
      memory: 768Mi

Horizontal Pod Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: sam-agent-hpa
  namespace: solace-agent-mesh
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sam-database-agent
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Network Configuration

Service Definition

apiVersion: v1
kind: Service
metadata:
  name: solace-agent-mesh
  namespace: solace-agent-mesh
spec:
  type: ClusterIP
  selector:
    app: solace-agent-mesh
  ports:
    - name: web-ui
      port: 5002
      targetPort: 5002
    - name: api
      port: 8000
      targetPort: 8000

Ingress Configuration

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: solace-agent-mesh
  namespace: solace-agent-mesh
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - agent-mesh.example.com
      secretName: agent-mesh-tls
  rules:
    - host: agent-mesh.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: solace-agent-mesh
                port:
                  number: 5002

Security Considerations

Pod Security

spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 999
        fsGroup: 999
      containers:
        - name: agent-mesh
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
            readOnlyRootFilesystem: false  # Agent Mesh needs write access

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: sam-network-policy
  namespace: solace-agent-mesh
spec:
  podSelector:
    matchLabels:
      app: solace-agent-mesh
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - port: 5002
        - port: 8000
  egress:
    - to:  # Allow DNS
        - namespaceSelector:
            matchLabels:
              name: kube-system
      ports:
        - port: 53
          protocol: UDP
    - to:  # Allow Solace broker
        - podSelector: {}
      ports:
        - port: 443
        - port: 55443
    - to:  # Allow LLM providers
        - podSelector: {}
      ports:
        - port: 443

Monitoring and Observability

ServiceMonitor for Prometheus

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: solace-agent-mesh
  namespace: solace-agent-mesh
spec:
  selector:
    matchLabels:
      app: solace-agent-mesh
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

Logging with FluentBit

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
data:
  parsers.conf: |
    [PARSER]
        Name   sam-json
        Format json
        Time_Key timestamp
        Time_Format %Y-%m-%dT%H:%M:%S.%L
  
  fluent-bit.conf: |
    [INPUT]
        Name              tail
        Path              /var/log/containers/*solace-agent-mesh*.log
        Parser            sam-json
        Tag               sam.*
        Refresh_Interval  5

Troubleshooting

Pod Won’t Start

# Check pod status
kubectl get pods -n solace-agent-mesh

# Describe pod for events
kubectl describe pod <pod-name> -n solace-agent-mesh

# Check logs
kubectl logs <pod-name> -n solace-agent-mesh

# Check previous container logs (if restarted)
kubectl logs <pod-name> -n solace-agent-mesh --previous

Image Pull Errors

# Verify pull secret exists
kubectl get secrets -n solace-agent-mesh

# Check secret is referenced in service account
kubectl get serviceaccount default -n solace-agent-mesh -o yaml

Health Check Failures

# Check health endpoints directly
kubectl port-forward pod/<pod-name> 8080:8080 -n solace-agent-mesh
curl http://localhost:8080/healthz
curl http://localhost:8080/readyz

# Check probe configuration
kubectl get pod <pod-name> -n solace-agent-mesh -o yaml | grep -A 10 livenessProbe

Connection Issues

# Test from inside pod
kubectl exec -it <pod-name> -n solace-agent-mesh -- /bin/bash
curl -v https://your-broker.messaging.solace.cloud
curl -v https://api.openai.com/v1/models

Next Steps

Production Best Practices

Security, monitoring, and operational best practices

Helm Quickstart Guide

Detailed Helm chart documentation and examples

Build docs developers (and LLMs) love