Quick Debug Command
Thedebug-k8s command provides instant cluster health overview:
- All pods across all namespaces
- Last 10 events sorted by timestamp
- Quick way to spot crashes, image pull failures, scheduling issues
Common Issues
Cluster Won’t Start
Symptoms:kubectlcommands hangcluster-starttimes out- “connection refused” errors
-
Restart cluster:
-
Full rebuild:
-
Check Docker daemon:
Pod Stuck in Pending
Symptoms:- Pod shows
Pendingstatus - Never transitions to
Running
-
Insufficient resources:
Solution: Reduce resource requests or add nodes
-
Image pull failure:
Solution: Check image name, load into Kind:
-
PVC not bound:
Solution: Check PVC status:
Pod CrashLoopBackOff
Symptoms:- Pod status:
CrashLoopBackOff - Restart count increasing
-
Application error:
- Check logs for stack traces
- Verify configuration (env vars, secrets)
- Test application locally
-
Missing dependencies:
- Database not ready
- Secret not created Solution: Add init containers or readiness probes
-
Liveness probe failing:
Image Pull Errors
Symptoms:ErrImagePullorImagePullBackOff- Pod can’t download container image
-
Load local image into Kind:
-
Fix image name:
- Check for typos
- Verify tag exists
- Ensure registry is accessible
-
Use custom OTel Collector (if applicable):
Service Not Reachable
Symptoms:- Can’t access service via NodePort or ClusterIP
- Connection timeout or refused
-
No endpoints (no pods match selector):
-
Wrong port:
- Verify service port matches container port
- Check NodePort range (30000-32000)
-
Pod not ready:
DNS Resolution Failing
Symptoms:- “Name or service not known”
- Can’t resolve service names
-
Restart CoreDNS:
-
Verify DNS service:
-
Check pod DNS config:
Persistent Volume Issues
Symptoms:- PVC stuck in
Pending - “no persistent volumes available”
-
For Kind (hostPath):
- Volumes are automatically provisioned
- Check storage class:
-
Create manual PV (if needed):
Manifest Apply Failures
Symptoms:kubectl applyreturns error- Resources not created/updated
-
CRD not installed:
Solution: Apply CRDs first:
-
Field immutable:
Solution: Delete and recreate:
-
Server-side apply conflict:
Solution: Force conflicts:
Advanced Debugging
Interactive Pod Debugging
Create debug pod in same namespace:Exec into Running Pod
Port Forward for Local Access
Copy Files To/From Pod
Analyze Resource Usage
Watch Resources in Real-Time
Bootstrap-Specific Issues
Bootstrap Hangs
Diagnosis:-
Kill and restart:
-
Clean bootstrap:
-
Manual cleanup:
Warm Cluster Not Detecting Changes
Symptoms:- Changed manifests not applied
- “All good” message but resources outdated
Garage Setup Fails
Symptoms:- Bootstrap fails at “Running Garage setup”
- Loki/Tempo can’t connect to storage