Skip to main content
This guide covers common issues you may encounter while operating your Kubernetes cluster and how to resolve them.

ArgoCD Issues

Cannot Access ArgoCD UI

If you cannot access the ArgoCD web interface:
# Verify ArgoCD pods are running
kubectl get pods -n argocd

# Port-forward to access the UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
Access the UI at https://localhost:8080.

Retrieve ArgoCD Admin Password

If you forgot the admin password:
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d ; echo

Application Not Syncing

  1. Check application status:
    kubectl get application -n argocd
    kubectl describe application <app-name> -n argocd
    
  2. View sync logs in the ArgoCD UI or check controller logs:
    kubectl logs -n argocd deployment/argocd-application-controller
    
  3. Common causes:
    • Repository not accessible: Verify SSH keys or access tokens
    • Invalid manifests: Check for YAML syntax errors
    • Resource conflicts: Existing resources may be blocking creation

Force Sync Application

Force a hard refresh and sync:
argocd app sync <app-name> --force
Or use the ArgoCD UI: Application → Sync → Synchronize

Pod Issues

Check Pod Status

List all pods and their status:
kubectl get pods --all-namespaces
For a specific namespace:
kubectl get pods -n <namespace>

Pod Stuck in Pending

Check why a pod is not scheduled:
kubectl describe pod <pod-name> -n <namespace>
Common causes:
  • Insufficient resources: Not enough CPU/memory on nodes
  • PersistentVolume issues: PVC not bound
  • Node selector mismatch: Pod requires specific node labels

Pod CrashLoopBackOff

View pod logs to identify the crash reason:
# Current logs
kubectl logs <pod-name> -n <namespace>

# Previous container logs (after crash)
kubectl logs <pod-name> -n <namespace> --previous
Common causes:
  • Application error: Check application logs
  • Missing config/secrets: Verify ConfigMaps and Secrets exist
  • Failed health checks: Liveness probe failing too quickly

View Logs from Multiple Containers

For pods with multiple containers:
# List containers in pod
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].name}'

# View specific container logs
kubectl logs <pod-name> -n <namespace> -c <container-name>

Stream Live Logs

Follow logs in real-time:
kubectl logs -f <pod-name> -n <namespace>

Network Issues

Service Not Accessible

Verify service exists and has endpoints:
kubectl get service <service-name> -n <namespace>
kubectl get endpoints <service-name> -n <namespace>
If endpoints are empty, pods may not match the service selector:
kubectl describe service <service-name> -n <namespace>
kubectl get pods -n <namespace> --show-labels

Test Service Connectivity

Create a temporary pod to test network connectivity:
kubectl run test-pod --rm -i --tty --image=nicolaka/netshoot -- /bin/bash
From inside the pod:
# Test DNS resolution
nslookup <service-name>.<namespace>.svc.cluster.local

# Test HTTP endpoint
curl http://<service-name>.<namespace>.svc.cluster.local:<port>

# Check connectivity
ping <service-name>.<namespace>.svc.cluster.local

Ingress Not Working

Check Ingress status:
kubectl get ingress -n <namespace>
kubectl describe ingress <ingress-name> -n <namespace>
Verify Ingress controller is running:
kubectl get pods -n ingress-nginx
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller
Common issues:
  • No address assigned: Ingress controller not running or LoadBalancer pending
  • 404 errors: Check that service and backend are configured correctly
  • 502/503 errors: Backend service is down or not ready

Check Network Policies

If you have NetworkPolicies, verify they allow the traffic:
kubectl get networkpolicies -n <namespace>
kubectl describe networkpolicy <policy-name> -n <namespace>

Certificate Issues

Certificate Not Issued

Check cert-manager resources:
# Check certificate status
kubectl get certificate -n <namespace>
kubectl describe certificate <cert-name> -n <namespace>

# Check certificate request
kubectl get certificaterequest -n <namespace>
kubectl describe certificaterequest <request-name> -n <namespace>

ACME Challenge Failing

Check challenge status:
kubectl get challenges --all-namespaces
kubectl describe challenge <challenge-name> -n <namespace>
Common issues:
  • DNS not resolving: Verify domain points to cluster IP
  • HTTP-01 challenge blocked: Ensure port 80 is accessible
  • Ingress misconfiguration: Check that Ingress routes to correct service

View cert-manager Logs

kubectl logs -n cert-manager deployment/cert-manager

Expired Certificates

List certificates and their expiration:
kubectl get certificate --all-namespaces -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,READY:.status.conditions[0].status,SECRET:.spec.secretName
View certificate details:
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.tls\.crt}' | base64 --decode | openssl x509 -noout -dates

Storage Issues

PVC Not Bound

Check PersistentVolumeClaim status:
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>
Check available PersistentVolumes:
kubectl get pv
Common causes:
  • No matching PV: StorageClass may not be configured
  • Insufficient capacity: PV size is smaller than requested
  • Access mode mismatch: PVC and PV access modes don’t match

Check StorageClass

kubectl get storageclass
kubectl describe storageclass <storageclass-name>

Resource Issues

Check Node Resources

View node resource usage:
kubectl top nodes
kubectl describe node <node-name>

Check Pod Resource Usage

kubectl top pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "Limits\|Requests"

Node Not Ready

Investigate node issues:
kubectl get nodes
kubectl describe node <node-name>
Check kubelet logs (SSH into node):
journalctl -u kubelet -f

Secrets and ConfigMaps

Secret Not Found

Verify secret exists:
kubectl get secret <secret-name> -n <namespace>
kubectl describe secret <secret-name> -n <namespace>

Sealed Secret Not Decrypting

Check sealed-secrets controller:
kubectl get pods -n kube-system | grep sealed-secrets
kubectl logs -n kube-system -l app.kubernetes.io/name=sealed-secrets
Verify SealedSecret resource:
kubectl get sealedsecrets -n <namespace>
kubectl describe sealedsecret <sealedsecret-name> -n <namespace>

General Debugging Commands

Execute Commands in Pod

kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

Copy Files To/From Pod

# Copy from pod
kubectl cp <namespace>/<pod-name>:/path/to/file ./local-file

# Copy to pod
kubectl cp ./local-file <namespace>/<pod-name>:/path/to/file

Get Events

View recent cluster events:
kubectl get events --sort-by='.lastTimestamp' -n <namespace>

Check API Server

Test cluster API connectivity:
kubectl cluster-info
kubectl version

Verify RBAC Permissions

Check if you can perform an action:
kubectl auth can-i create pods -n <namespace>
kubectl auth can-i "*" "*" --all-namespaces

Monitoring and Metrics

If you have Prometheus and Grafana set up:
# Port-forward to Grafana
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

# Port-forward to Prometheus
kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090
Access dashboards to view metrics and identify issues.

Getting Help

If you’re still stuck:
  1. Check application logs thoroughly
  2. Review recent changes in Git history
  3. Verify all configuration in ArgoCD
  4. Check Kubernetes events for error messages
  5. Consult official documentation for specific components
When asking for help, include: pod/service names, namespace, error messages from logs, and output of relevant kubectl describe commands.

Build docs developers (and LLMs) love