Pod Issues
Pod stuck in Pending state
Pod stuck in Pending state
- Pod shows
Pendingstatus indefinitely - Pods not being scheduled
- Insufficient Resources
- Node Selector/Affinity Mismatch
- PVC Not Bound
- CPU/Memory pressure
- Allocatable vs requested resources
- Scale up cluster (add nodes)
- Reduce resource requests
- Enable cluster autoscaler
Pod stuck in CrashLoopBackOff
Pod stuck in CrashLoopBackOff
Pod stuck in ImagePullBackOff
Pod stuck in ImagePullBackOff
-
Image doesn’t exist
- Typo in image name/tag
- Image was deleted from registry
-
Authentication required
-
Network issues
- Registry is unreachable from cluster
- Firewall blocking registry access
-
Rate limiting
- Docker Hub rate limits (100 pulls/6h for anonymous)
- Solution: Authenticate or use alternative registry
Pod stuck in Terminating state
Pod stuck in Terminating state
Terminating status.Debugging:-
Finalizers preventing deletion
-
PreStop hook hanging
- PreStop hook takes too long
- Increase
terminationGracePeriodSeconds
-
Force delete (last resort)
OOMKilled - Out of Memory
OOMKilled - Out of Memory
OOMKilled.Check:-
Increase memory limits
-
Fix memory leak in application
- Profile application memory usage
- Check for unbounded caches
- Review database connection pooling
-
Enable Vertical Pod Autoscaler
- VPA automatically adjusts resource requests/limits
-
Monitor memory usage
Configuration Issues
ConfigMap/Secret changes not reflecting in Pods
ConfigMap/Secret changes not reflecting in Pods
- Environment Variables
- Volume Mounts
- Immutable ConfigMaps
Service not routing traffic to Pods
Service not routing traffic to Pods
-
Check Service endpoints
If no endpoints, Service selector doesn’t match Pod labels.
-
Verify Pod labels
Labels must match exactly.
-
Check Pod readiness
Pods must be
RunningandREADY 1/1. If not ready, check readiness probe: -
Test Service connectivity
-
Check network policies
Networking Issues
DNS resolution not working
DNS resolution not working
- Pods can’t resolve service names
nslookupfails inside Pod
-
Test DNS from Pod
-
Check CoreDNS/kube-dns Pods
-
Verify DNS service
-
Check Pod DNS config
Should contain:
-
Common issues:
- CoreDNS not running
- Network policy blocking DNS traffic (port 53)
- Wrong CNI configuration
Ingress not routing traffic
Ingress not routing traffic
-
Ingress Controller installed?
-
Ingress resource created?
-
Check Ingress address
-
Verify Service and endpoints exist
-
Check Ingress Controller logs
-
Test without Ingress
If this works, issue is with Ingress configuration.
-
Common issues:
- DNS not pointing to Ingress load balancer
- Incorrect
hostin Ingress rules - TLS certificate issues
- Path not matching (use
pathType: Prefix)
Cross-namespace communication failing
Cross-namespace communication failing
Storage Issues
PVC stuck in Pending
PVC stuck in Pending
-
No matching PersistentVolume
Check:
- StorageClass matches
- Access modes compatible
- Sufficient capacity
-
StorageClass not found
-
Dynamic provisioning not configured
- Check if CSI driver installed
- Verify cloud provider credentials
-
Node affinity mismatch (local volumes)
- PV has node affinity that doesn’t match any schedulable node
Pod can't mount volume
Pod can't mount volume
Unable to attach or mount volumesMulti-Attach errorVolume is already exclusively attached
-
ReadWriteOnce (RWO) volume already mounted
- RWO volumes can only be mounted by one node
- Check if volume is mounted by another Pod
Solution: Delete old Pod first, or use ReadWriteMany (RWX) if supported. -
Volume not detached from previous node
Solution: Wait for volume to detach, or manually delete VolumeAttachment.
-
Permission issues
Solution:
Node Issues
Node in NotReady state
Node in NotReady state
-
kubelet not running
-
Network plugin issues
- CNI plugin not installed or misconfigured
- Check pod network (Calico, Flannel, Weave)
-
Disk pressure
-
Memory/CPU pressure
-
Certificate expired
Pods evicted from node
Pods evicted from node
- DiskPressure
- Node running out of disk space
- Check node conditions:
- Clean up unused images and containers
- Increase disk size
- Configure garbage collection
- MemoryPressure
- Node running out of memory
- Kubelet starts evicting lowest priority Pods Solution:
- Set proper resource requests/limits
- Add more nodes
- Use memory-efficient applications
- Node maintenance
- Manual cordon and drain
Debugging Tools & Commands
Essential kubectl debugging commands
Essential kubectl debugging commands
Network debugging tools
Network debugging tools
curl,wget- HTTP requestsnslookup,dig- DNS debuggingping,traceroute- Network connectivitynetstat,ss- Socket statisticstcpdump- Packet captureiperf- Network performance
Performance troubleshooting
Performance troubleshooting
kubectl top):Common Error Messages
Error: 'container is unhealthy, it will be killed and re-created'
Error: 'container is unhealthy, it will be killed and re-created'
-
Increase probe timeouts:
-
Fix application health endpoint
- Ensure
/healthreturns 200 OK - Check app logs for errors
- Ensure
-
Use startup probe for slow-starting apps
Error: 'Back-off pulling image'
Error: 'Back-off pulling image'
Error: 'rpc error: code = Unknown desc = Error: No such container'
Error: 'rpc error: code = Unknown desc = Error: No such container'
-
Check container runtime on node:
-
Restart container runtime:
-
Check runtime logs:
Best Practices for Troubleshooting
- Gather information - logs, events, describe output
- Isolate the issue - Pod? Node? Network? Storage?
- Check recent changes - deployments, config updates
- Test incrementally - eliminate variables one by one
- Document findings - help future troubleshooting
- Kubernetes Troubleshooting Guide
- kubectl Cheat Sheet
- Community forums: Kubernetes Slack, Stack Overflow