Production Best Practices
Deploying Solace Agent Mesh in production requires careful attention to security, reliability, performance, and operational excellence. This guide covers essential best practices to ensure your deployment is production-ready.Security Best Practices
Secrets Management
Never store sensitive information in code or configuration files. Use dedicated secret management solutions:Kubernetes Secrets
Kubernetes Secrets
- Use
kubectl create secretinstead of YAML files - Enable encryption at rest in your cluster
- Use RBAC to restrict secret access
- Consider Sealed Secrets or External Secrets Operator
AWS Secrets Manager
AWS Secrets Manager
- Use IAM roles for authentication
- Enable automatic rotation
- Use AWS Secrets and Configuration Provider (ASCP) for Kubernetes
HashiCorp Vault
HashiCorp Vault
- Use Vault Agent for sidecar injection
- Configure dynamic secrets for databases
- Enable audit logging
Azure Key Vault
Azure Key Vault
- Use managed identities for authentication
- Enable soft delete and purge protection
- Use Azure Key Vault Provider for Kubernetes
TLS/SSL Configuration
Encrypt all communication channels: Solace Event Broker:Container Security
Run as non-root user: Agent Mesh containers run as UID 999 by default:Authentication and Authorization
Configure Identity Provider (IdP):Data Protection
Encrypt Sensitive Data:High Availability and Reliability
Multi-Replica Deployment
Run multiple instances:Queue Configuration
Use durable queues for container environments:- Navigate to Message VPNs → Queues → Templates
- Create template:
- Queue Name Filter:
sam/> - Respect TTL:
true - Maximum TTL:
18000seconds (5 hours) - Max Message Size:
10000000bytes (10 MB)
- Queue Name Filter:
Health Checks and Auto-Recovery
Configure comprehensive health checks:Resource Quotas and Limits
Define resource boundaries:Database High Availability
Use managed database services with HA: AWS RDS:Performance Optimization
Database Performance
Connection Pooling:Object Storage Optimization
S3 Performance:Autoscaling
Horizontal Pod Autoscaling:Caching Strategies
LLM Prompt Caching:Monitoring and Observability
Application Logging
Structured JSON Logging:Metrics and Monitoring
Prometheus Metrics:- Request rate and latency
- Error rates (4xx, 5xx)
- LLM API call latency and costs
- Database connection pool usage
- Message queue depth
- CPU and memory usage
- Disk I/O and storage usage
Alerting
PrometheusRule for Alerts:Distributed Tracing
OpenTelemetry Integration:Operational Excellence
CI/CD Pipeline
Example GitLab CI:Disaster Recovery
Backup Strategy:Configuration Management
GitOps with ArgoCD:Documentation
Maintain runbooks for common scenarios:- Deployment procedures
- Rollback procedures
- Incident response playbooks
- Disaster recovery procedures
- Capacity planning guidelines
- Security incident response
Cost Optimization
Resource Right-Sizing
Monitor and adjust:Storage Optimization
Lifecycle Policies:LLM Cost Management
Monitor and optimize:Compliance and Governance
Audit Logging
Enable comprehensive audit trails:Data Residency
Ensure data stays in required regions:Access Control
Implement principle of least privilege:Checklist
Before going to production, verify:Security Checklist
Security Checklist
- Secrets stored in secure vault (not in code)
- TLS enabled for all connections
- Containers run as non-root
- Network policies configured
- Image vulnerability scanning enabled
- Authentication/authorization configured
- Database encryption at rest enabled
- S3 bucket encryption enabled
- Regular security audits scheduled
Reliability Checklist
Reliability Checklist
- Minimum 3 replicas configured
- Health checks implemented
- Resource limits defined
- Durable queues configured
- Database HA enabled
- Backup and restore tested
- Disaster recovery plan documented
- Auto-scaling configured
Observability Checklist
Observability Checklist
- Structured logging enabled
- Centralized log aggregation
- Metrics collection configured
- Alerts defined and tested
- Distributed tracing enabled
- Dashboards created
- On-call rotation established
Operational Checklist
Operational Checklist
- CI/CD pipeline configured
- GitOps workflow established
- Runbooks documented
- Incident response procedures
- Regular backup testing
- Capacity planning done
- Cost monitoring enabled
- Compliance requirements met
Next Steps
Health Checks
Configure comprehensive health monitoring
Observability
Set up monitoring and tracing
Logging
Configure application logging
Configuration
Complete configuration reference