Common Issues
LoadBalancer Service Not Getting External IP
Symptom
Symptom
LoadBalancer service remains in
Pending state:Diagnosis
Diagnosis
- Check if CCM is running in the tenant cluster:
- Verify LoadBalancerClass configuration (if enabled):
- Check if LoadBalancer resource was created in management cluster:
- Verify CCM connection to management cluster:
Solutions
Solutions
- CCM not running: Check CCM deployment and ensure kubeconfig is correctly mounted
- LoadBalancerClass mismatch: Add
spec.loadBalancerClass: kubelbto service, or set--use-loadbalancer-class=false - CCM disconnected: Verify kubelb-kubeconfig secret exists and has valid credentials
- Permission issues: Ensure CCM has RBAC permissions in management cluster
- Missing tenant namespace: Create tenant namespace in management cluster:
kubectl create ns tenant-<cluster-name>
Ingress Not Reachable
Symptom
Symptom
Ingress resource created but traffic doesn’t reach backend:
Diagnosis
Diagnosis
- Check if Ingress was converted to Route in management cluster:
- Verify IngressClass is correct (if enabled):
- Check Envoy Gateway resources:
- Verify backend endpoints exist:
Solutions
Solutions
- IngressClass mismatch: Set
spec.ingressClassName: kubelbin Ingress, or use--use-ingress-class=false - Ingress controller disabled: Check CCM flags, ensure
--disable-ingress-controller=false - Missing backend service: Ensure service exists and has endpoints in tenant cluster
- Node endpoints not synced: Check KubeLBNodeReconciler logs:
kubectl -n kube-system logs -l app=kubelb-ccm | grep node.reconciler - Envoy Gateway not ready: Verify Envoy Gateway deployment is healthy in management cluster
Gateway API Resources Not Working
Symptom
Symptom
Gateway or HTTPRoute created but not functioning:
Diagnosis
Diagnosis
- Verify Gateway API is enabled:
- Check if Gateway API CRDs are installed:
- Verify GatewayClass is correct:
- Check controller logs for errors:
Solutions
Solutions
- Gateway API not enabled: Add
--enable-gateway-api=trueto both Manager and CCM - CRDs not installed: Install Gateway API CRDs or use
--install-gateway-api-crds=true - Wrong GatewayClass: Use
gatewayClassName: kubelbor set--use-gateway-class=false - Gateway controller disabled: Ensure
--disable-gateway-controller=falseand--disable-httproute-controller=false - Wrong CRD channel: If using experimental features, set
--gateway-api-crds-channel=experimental
Envoy Proxy Not Starting
Symptom
Symptom
Envoy proxy pods are crashing or not ready:
Diagnosis
Diagnosis
- Check Envoy pod logs:
- Verify xDS control plane is accessible:
- Check Envoy configuration:
- Verify resource constraints:
Solutions
Solutions
- xDS unreachable: Ensure Manager service is accessible on port 8001, check network policies
- Resource limits too low: Increase
spec.envoyProxy.resourcesin Config CRD - Image pull error: Verify
spec.envoyProxy.imageis correct and accessible - Node selector mismatch: Check
spec.envoyProxy.nodeSelectormatches available nodes - Configuration error: Review Config CRD for invalid settings, check Manager logs for validation errors
High Reconciliation Latency
Symptom
Symptom
Changes to services or ingresses take a long time to propagate:
Diagnosis
Diagnosis
- Check controller queue depth:
- Monitor reconciliation metrics:
- Check API server latency:
- Verify resource utilization:
Solutions
Solutions
- Resource constraints: Increase CPU/memory requests for Manager or CCM pods
- High error rate: Fix underlying errors causing retries (check logs)
- API server throttling: Increase QPS/burst limits in kubeconfig
- Large number of resources: Consider optimizing reconciliation logic or increasing replicas
- Network latency: Ensure good connectivity between CCM and management cluster
Secret Synchronization Failing
Symptom
Symptom
Secrets not syncing from tenant to management cluster:
Diagnosis
Diagnosis
- Check if secret synchronizer is enabled:
- Verify secret has correct label (if using auto-conversion):
- Check SyncSecret resource:
- Review controller logs:
Solutions
Solutions
- Synchronizer not enabled: Add
--enable-secret-synchronizer=trueto CCM flags - Missing label: Add label
kubelb.k8c.io/managed-by: kubelbto source secret - RBAC issues: Ensure CCM has permission to create secrets in management cluster
- Source secret not found: Verify secret reference in SyncSecret.spec.target.secret.name
- Namespace mismatch: Ensure tenant namespace exists in management cluster
Debugging Commands
Check Component Status
Inspect Resources
Increase Logging Verbosity
Add the--zap-log-level flag to increase logging detail:
Manager Deployment
CCM Deployment
Enable Debug Mode
For Manager, enable xDS debug logging:Log Analysis
Key Log Messages
Manager Log Patterns
Manager Log Patterns
CCM Log Patterns
CCM Log Patterns
Envoy Log Patterns
Envoy Log Patterns
Centralized Logging
For production deployments, use centralized logging:Configure Log Aggregation
Use Fluentd, Fluent Bit, or Promtail to collect logs from all KubeLB components.
Add Structured Logging Labels
KubeLB logs include structured fields for filtering:
component: manager, ccm, envoycontroller: LoadBalancer, Route, Node, etc.namespace,tenant,name
Performance Issues
High Memory Usage
Common causes:- Large number of LoadBalancer resources
- Memory leaks (check for increasing memory over time)
- Inefficient caching
- Increase memory limits
- Enable overload manager for Envoy
- Restart pods to clear caches
- Check for memory leaks in logs
High CPU Usage
Common causes:- Frequent reconciliation loops
- High error rate causing retries
- Large number of resources to watch
- Check for reconciliation errors and fix root cause
- Increase CPU limits
- Optimize controller code (report issue if persistent)
Network Issues
CCM Cannot Connect to Management Cluster
Common causes:- Incorrect kubeconfig
- Network policy blocking egress
- Firewall rules
- Certificate expired
- Validate kubeconfig manually:
kubectl --kubeconfig=<path> get ns - Check network policies:
kubectl get networkpolicies -A - Verify DNS resolution:
kubectl -n kube-system exec <ccm-pod> -- nslookup kubelb.example.com - Check certificate expiration in kubeconfig
Envoy Cannot Reach Tenant Nodes
Common causes:- Node IP addresses not routable from management cluster
- NodePort service not accessible
- Network policy blocking ingress to nodes
- Verify node addresses are correct:
- Test NodePort accessibility from management cluster
- Use correct
--node-address-type(ExternalIP, InternalIP, or Hostname) - Check network policies in tenant cluster
Getting Help
Check Logs
Gather logs from Manager, CCM, and Envoy components
Review Metrics
Check Prometheus metrics for error rates and latency
Inspect Resources
Verify LoadBalancer, Route, and Addresses resources
Test Connectivity
Validate network connectivity between components
Report Issues
When reporting issues, include:- KubeLB version: Check Manager and CCM deployment images
- Component logs: Last 100-200 lines from relevant pods
- Resource manifests: LoadBalancer, Route, Config, and related resources
- Metrics: Relevant Prometheus metrics showing the issue
- Environment details: Kubernetes version, cluster topology, network setup
For official support, refer to the KubeLB documentation or open an issue in the GitHub repository.
Next Steps
Monitoring
Set up metrics and alerts to prevent issues
Configuration
Review and optimize your configuration
