This guide covers common issues you may encounter while operating your Kubernetes cluster and how to resolve them.
ArgoCD Issues
Cannot Access ArgoCD UI
If you cannot access the ArgoCD web interface:
# Verify ArgoCD pods are running
kubectl get pods -n argocd
# Port-forward to access the UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
Access the UI at https://localhost:8080.
Retrieve ArgoCD Admin Password
If you forgot the admin password:
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d ; echo
Application Not Syncing
-
Check application status:
kubectl get application -n argocd
kubectl describe application <app-name> -n argocd
-
View sync logs in the ArgoCD UI or check controller logs:
kubectl logs -n argocd deployment/argocd-application-controller
-
Common causes:
- Repository not accessible: Verify SSH keys or access tokens
- Invalid manifests: Check for YAML syntax errors
- Resource conflicts: Existing resources may be blocking creation
Force Sync Application
Force a hard refresh and sync:
argocd app sync <app-name> --force
Or use the ArgoCD UI: Application → Sync → Synchronize
Pod Issues
Check Pod Status
List all pods and their status:
kubectl get pods --all-namespaces
For a specific namespace:
kubectl get pods -n <namespace>
Pod Stuck in Pending
Check why a pod is not scheduled:
kubectl describe pod <pod-name> -n <namespace>
Common causes:
- Insufficient resources: Not enough CPU/memory on nodes
- PersistentVolume issues: PVC not bound
- Node selector mismatch: Pod requires specific node labels
Pod CrashLoopBackOff
View pod logs to identify the crash reason:
# Current logs
kubectl logs <pod-name> -n <namespace>
# Previous container logs (after crash)
kubectl logs <pod-name> -n <namespace> --previous
Common causes:
- Application error: Check application logs
- Missing config/secrets: Verify ConfigMaps and Secrets exist
- Failed health checks: Liveness probe failing too quickly
View Logs from Multiple Containers
For pods with multiple containers:
# List containers in pod
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].name}'
# View specific container logs
kubectl logs <pod-name> -n <namespace> -c <container-name>
Stream Live Logs
Follow logs in real-time:
kubectl logs -f <pod-name> -n <namespace>
Network Issues
Service Not Accessible
Verify service exists and has endpoints:
kubectl get service <service-name> -n <namespace>
kubectl get endpoints <service-name> -n <namespace>
If endpoints are empty, pods may not match the service selector:
kubectl describe service <service-name> -n <namespace>
kubectl get pods -n <namespace> --show-labels
Test Service Connectivity
Create a temporary pod to test network connectivity:
kubectl run test-pod --rm -i --tty --image=nicolaka/netshoot -- /bin/bash
From inside the pod:
# Test DNS resolution
nslookup <service-name>.<namespace>.svc.cluster.local
# Test HTTP endpoint
curl http://<service-name>.<namespace>.svc.cluster.local:<port>
# Check connectivity
ping <service-name>.<namespace>.svc.cluster.local
Ingress Not Working
Check Ingress status:
kubectl get ingress -n <namespace>
kubectl describe ingress <ingress-name> -n <namespace>
Verify Ingress controller is running:
kubectl get pods -n ingress-nginx
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller
Common issues:
- No address assigned: Ingress controller not running or LoadBalancer pending
- 404 errors: Check that service and backend are configured correctly
- 502/503 errors: Backend service is down or not ready
Check Network Policies
If you have NetworkPolicies, verify they allow the traffic:
kubectl get networkpolicies -n <namespace>
kubectl describe networkpolicy <policy-name> -n <namespace>
Certificate Issues
Certificate Not Issued
Check cert-manager resources:
# Check certificate status
kubectl get certificate -n <namespace>
kubectl describe certificate <cert-name> -n <namespace>
# Check certificate request
kubectl get certificaterequest -n <namespace>
kubectl describe certificaterequest <request-name> -n <namespace>
ACME Challenge Failing
Check challenge status:
kubectl get challenges --all-namespaces
kubectl describe challenge <challenge-name> -n <namespace>
Common issues:
- DNS not resolving: Verify domain points to cluster IP
- HTTP-01 challenge blocked: Ensure port 80 is accessible
- Ingress misconfiguration: Check that Ingress routes to correct service
View cert-manager Logs
kubectl logs -n cert-manager deployment/cert-manager
Expired Certificates
List certificates and their expiration:
kubectl get certificate --all-namespaces -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,READY:.status.conditions[0].status,SECRET:.spec.secretName
View certificate details:
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.tls\.crt}' | base64 --decode | openssl x509 -noout -dates
Storage Issues
PVC Not Bound
Check PersistentVolumeClaim status:
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>
Check available PersistentVolumes:
Common causes:
- No matching PV: StorageClass may not be configured
- Insufficient capacity: PV size is smaller than requested
- Access mode mismatch: PVC and PV access modes don’t match
Check StorageClass
kubectl get storageclass
kubectl describe storageclass <storageclass-name>
Resource Issues
Check Node Resources
View node resource usage:
kubectl top nodes
kubectl describe node <node-name>
Check Pod Resource Usage
kubectl top pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "Limits\|Requests"
Node Not Ready
Investigate node issues:
kubectl get nodes
kubectl describe node <node-name>
Check kubelet logs (SSH into node):
Secrets and ConfigMaps
Secret Not Found
Verify secret exists:
kubectl get secret <secret-name> -n <namespace>
kubectl describe secret <secret-name> -n <namespace>
Sealed Secret Not Decrypting
Check sealed-secrets controller:
kubectl get pods -n kube-system | grep sealed-secrets
kubectl logs -n kube-system -l app.kubernetes.io/name=sealed-secrets
Verify SealedSecret resource:
kubectl get sealedsecrets -n <namespace>
kubectl describe sealedsecret <sealedsecret-name> -n <namespace>
General Debugging Commands
Execute Commands in Pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh
Copy Files To/From Pod
# Copy from pod
kubectl cp <namespace>/<pod-name>:/path/to/file ./local-file
# Copy to pod
kubectl cp ./local-file <namespace>/<pod-name>:/path/to/file
Get Events
View recent cluster events:
kubectl get events --sort-by='.lastTimestamp' -n <namespace>
Check API Server
Test cluster API connectivity:
kubectl cluster-info
kubectl version
Verify RBAC Permissions
Check if you can perform an action:
kubectl auth can-i create pods -n <namespace>
kubectl auth can-i "*" "*" --all-namespaces
Monitoring and Metrics
If you have Prometheus and Grafana set up:
# Port-forward to Grafana
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
# Port-forward to Prometheus
kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090
Access dashboards to view metrics and identify issues.
Getting Help
If you’re still stuck:
- Check application logs thoroughly
- Review recent changes in Git history
- Verify all configuration in ArgoCD
- Check Kubernetes events for error messages
- Consult official documentation for specific components
When asking for help, include: pod/service names, namespace, error messages from logs, and output of relevant kubectl describe commands.