DevOps & Kubernetes Interview Questions

Kubernetes Core Concepts

What is a Headless Service in Kubernetes, and when do you use it?

A Headless Service is created without a ClusterIP, enabling clients to receive Pod IPs directly via DNS instead of load-balanced traffic.When to Use:When you need direct Pod-to-Pod communication, especially in StatefulSets like:

Cassandra
MongoDB
Kafka

Each Pod must be uniquely addressable (pod-0, pod-1, etc.).

apiVersion: v1
kind: Service
metadata:
  name: my-headless-service
spec:
  clusterIP: None  # This makes it headless
  selector:
    app: myapp
  ports:
  - port: 80

How can a Pod talk to a Service in another namespace?

Use the service’s FQDN (Fully Qualified Domain Name):

<service-name>.<namespace>.svc.cluster.local

Example:A Pod in dev namespace accessing a service in prod namespace:

curl http://nginx-service.prod.svc.cluster.local

You can also use the shorter form:

curl http://nginx-service.prod

What is a Deployment in Kubernetes?

A Deployment manages the lifecycle of Pods—scaling, rolling updates, and rollbacks.Key Features:

Declarative updates for Pods and ReplicaSets
Rolling updates with zero downtime
Easy rollback to previous versions
Scaling up/down replicas
Self-healing (recreates failed Pods)

Explain a Kubernetes Pod to a 5-year-old

A Pod is like a lunchbox holding your food containers (apps). They always stay together.In technical terms:

A Pod is the smallest deployable unit in Kubernetes
It can contain one or more containers
Containers in a Pod share network and storage
They’re scheduled together on the same node

What is a StatefulSet?

StatefulSet manages Pods requiring persistent storage or stable network identity.Use Cases:

Databases (MySQL, PostgreSQL, MongoDB)
Distributed systems (Kafka, Cassandra, Elasticsearch)
Applications requiring stable hostnames

Key Features:

Ordered, graceful deployment and scaling
Stable, persistent storage
Stable network identifiers (pod-0, pod-1, pod-2)
Ordered, automated rolling updates

What is a DaemonSet?

DaemonSet ensures a Pod runs on every node (or selected nodes).Common Use Cases:

Monitoring agents (Prometheus Node Exporter, Datadog)
Log collectors (Fluentd, Filebeat)
Storage daemons (Ceph, GlusterFS)
Network proxies (kube-proxy)

When you add a new node, the DaemonSet automatically deploys the Pod to it.

What are Kubernetes Operators?

Operators = Controllers that extend Kubernetes API using CRDs (Custom Resource Definitions)They automate:

Deployment
Upgrades
Healing
Backup/restore

Examples:

Prometheus Operator
MySQL Operator
Kafka Operator

Operators encode operational knowledge into software to manage complex applications.

What are Admission Controllers?

Kubernetes plugins that intercept API requests before they’re persisted in etcd.Two Types:

Validating → Validate requests (allow/deny)
Mutating → Modify requests (inject sidecars, set defaults)

Common Use Cases:

Enforce security policies
Inject sidecars automatically
Deny privileged pods
Resource quotas
Limit ranges
Image scanning enforcement

Services & Networking

What is a Service in Kubernetes?

A Service is a stable network endpoint that exposes Pods for communication.Service Types:

ClusterIP (default) - Internal cluster access only
NodePort - Exposes service on each node’s IP at a static port
LoadBalancer - Creates external load balancer (cloud providers)
ExternalName - Maps service to external DNS name

LoadBalancer vs Ingress Controller

LoadBalancer Service
Ingress Controller

LoadBalancer:

Layer 4 traffic distribution (TCP/UDP)
Allocates a cloud load balancer (AWS ELB, GCP LB)
One LB per service → expensive
Simple setup, but costly at scale

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: LoadBalancer
  ports:
  - port: 80
  selector:
    app: myapp

Ingress:

Layer 7 HTTP routing & SSL termination
Routes multiple services behind one load balancer
Can route via hostname/path rules
Requires an Ingress Controller (NGINX, Traefik, AWS ALB)
Cost-effective for multiple services

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
spec:
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-service
            port:
              number: 80

Why do we need an Ingress Controller?

Because the Ingress resource only defines routing rules—it does nothing by itself.The Ingress Controller implements those rules using a reverse proxy such as:

NGINX
Traefik
HAProxy
AWS ALB
Istio Gateway

Think of Ingress as a configuration file, and Ingress Controller as the software that reads and applies it.

What are network policies in Kubernetes?

Network Policies control pod-to-pod, pod-to-service, and pod-to-external traffic.They act as firewalls for Pods.Example:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend
spec:
  podSelector:
    matchLabels:
      app: backend
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend

Network Policies require a CNI plugin that supports them (Calico, Cilium, Weave)

Troubleshooting

Pod stuck in CrashLoopBackOff — how do you troubleshoot?

A Pod goes into CrashLoopBackOff when its container keeps crashing and restarting.Steps to Debug:

# Check current logs
kubectl logs <pod-name> -n <namespace>

# Check logs from previous crashed container
kubectl logs <pod-name> -n <namespace> --previous

# Get detailed Pod information
kubectl describe pod <pod-name>

Common Causes:

Incorrect image or tag
Missing ConfigMap/Secret
Application runtime error
Failing liveness probe
Insufficient resources
Wrong command/args in Pod spec

ConfigMap changes not showing in Pods — why?

Reason:

If mounted as a volume → updates auto-refresh (with small delay)
If used as environment variables → Pod restart required

Fix:

kubectl rollout restart deployment <deployment-name>

For automatic updates, always mount ConfigMaps as volumes instead of using environment variables.

Can you create a Pod without a Deployment?

Yes, absolutely!

kubectl run my-pod --image=nginx

But you lose:

Scaling capabilities
Self-healing (automatic restart)
Rolling updates
Rollback functionality
Declarative management

Standalone Pods are useful for testing and debugging, but not recommended for production workloads.

Scheduling & Affinity

What is Node Affinity and when to use it?

Node Affinity controls which nodes a Pod can be scheduled on using labels.Types:

requiredDuringSchedulingIgnoredDuringExecution → hard rule (must match)
preferredDuringSchedulingIgnoredDuringExecution → soft rule (prefer but not required)

Use Cases:

Schedule GPU workloads on GPU nodes
Cost optimization using spot vs on-demand nodes
Separate production and development workloads
Place workloads in specific availability zones

Example: Run batch jobs on spot nodes, critical apps on on-demand nodes.

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node-type
          operator: In
          values:
          - gpu

Node Selector vs Node Affinity

Feature	Node Selector	Node Affinity
Complexity	Simple	Advanced
Operators	Exact match only	In, NotIn, Exists, DoesNotExist
Soft Rules	No	Yes (preferred)
Flexibility	Low	High
Multiple conditions	No	Yes

Node Selector (Simple):

nodeSelector:
  disktype: ssd

Node Affinity (Advanced):

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: disktype
          operator: In
          values:
          - ssd
          - nvme

When to use Pod Anti-Affinity?

Pod Anti-Affinity prevents Pods from running on the same node.Use Cases:

High Availability
- Spread replicas across nodes to avoid single-node failure
Performance Isolation
- Prevent two heavy workloads from competing on the same node
Security
- Keep sensitive workloads separate from less trusted workloads

Example:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - topologyKey: kubernetes.io/hostname
      labelSelector:
        matchLabels:
          app: myapp

This ensures no two Pods with label app: myapp run on the same node.

Deployment Strategies

Types of Kubernetes Deployments

Recreate
Rolling Update
Canary
Blue-Green

Terminates all old Pods before creating new ones
Downtime occurs
Simplest strategy
Use when downtime is acceptable

Gradually replaces old Pods with new ones
No downtime (default strategy)
Can configure maxUnavailable and maxSurge
Most common production strategy

How do you handle rollbacks in Kubernetes?

# Rollback to previous revision
kubectl rollout undo deployment/<name>

# Rollback to specific revision
kubectl rollout undo deployment/<name> --to-revision=<rev>

# Check rollout history
kubectl rollout history deployment/<name>

# Check rollout status
kubectl rollout status deployment/<name>

How it works:

Kubernetes keeps previous ReplicaSets
Tools like ArgoCD, Helm, Spinnaker can handle rollbacks
By default, keeps last 10 revisions (configurable via revisionHistoryLimit)

AWS & Cloud

How would you isolate a network within an AWS VPC?

Use separate public/private subnets with different route tables
Configure NACLs (Network Access Control Lists) at subnet level
Use Security Groups at instance level
Optionally use AWS Network Firewall for advanced filtering
Use Transit Gateway for centralized network management
Implement VPC Peering or PrivateLink for controlled cross-VPC communication

Architectural differences between GCP VPC and AWS VPC

Feature	AWS VPC	GCP VPC
Scope	Region-scoped	Global
Subnets	AZ-specific	Regional (span zones)
Cross-region	Requires VPC Peering	Built-in
Default routing	Explicit	Automatic cross-region
Peering	Explicit setup required	Simpler setup

How do you connect two VPCs in AWS?

Three Main Options:

VPC Peering
- Direct network connection between two VPCs
- Simple 1-to-1 connection
- Non-transitive (A↔B, B↔C doesn’t mean A↔C)
Transit Gateway
- Hub-and-spoke model
- Connects multiple VPCs and on-premises networks
- Scalable, centralized management
- Best for complex network topologies
PrivateLink
- Expose services privately
- No VPC peering required
- Service-level access, not network-level

How to set up Kubernetes on AWS using EKS?

# Using eksctl (easiest method)
eksctl create cluster --name myCluster --region us-east-1

# Update kubeconfig
aws eks update-kubeconfig --name myCluster --region us-east-1

# Verify cluster
kubectl get nodes

What eksctl creates:

EKS Control Plane
Worker node groups (EC2 instances or Fargate)
VPC with public/private subnets
IAM roles for cluster and nodes
Security groups

Infrastructure as Code

How do you bring an existing resource into Terraform state?

terraform import <resource_type>.<name> <resource_id>

Example:

# Import existing AWS EC2 instance
terraform import aws_instance.myserver i-1234567890abcdef0

# Import S3 bucket
terraform import aws_s3_bucket.mybucket my-bucket-name

Then run:

terraform plan

You must first write the Terraform resource block before importing. Terraform import only updates state, it doesn’t generate configuration.

What is Terraform state drift? How can you detect and solve it?

State Drift occurs when infrastructure is changed manually (outside Terraform).Detection:

terraform plan  # Shows differences between state and reality
terraform refresh  # Updates state without applying changes

Solutions:

Apply Terraform changes (bring infrastructure back to desired state)
```
terraform apply
```
Import manual changes (accept manual changes into state)
```
terraform import <resource>
```
Prevention:
- Use Terraform Cloud/Enterprise sentinel policies
- Implement proper RBAC
- Use cloud provider service control policies
- Regular terraform plan in CI/CD

Backup & Disaster Recovery

How would you back up a cluster? What tools would you use?

Kubernetes Backup:

Velero - Most popular K8s backup tool
- Backs up cluster resources and persistent volumes
- Supports disaster recovery and cluster migration

Database Backup:

AWS Backup - Centralized backup service
Database-specific tools (pg_dump, mysqldump)
Cloud provider snapshots (EBS, RDS automated backups)

Stateful Workloads:

PVC snapshots using CSI drivers
Volume snapshots via cloud provider APIs

etcd Backup (for cluster state):

ETCDCTL_API=3 etcdctl snapshot save snapshot.db

Cluster Management

How would you upgrade a Kubernetes cluster?

Best Practices:

Test in staging first
Upgrade control plane → then nodes

Cordon & drain nodes before upgrading:

kubectl cordon <node-name>
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

Upgrade one minor version at a time (1.26 → 1.27 → 1.28)

Managed Kubernetes (EKS/GKE/AKS):

Control plane upgrade is managed by cloud provider
Manually upgrade node groups

Self-managed (kubeadm):

# Upgrade kubeadm
apt-get update && apt-get install -y kubeadm=1.28.0-00

# Plan upgrade
kubeadm upgrade plan

# Apply upgrade
kubeadm upgrade apply v1.28.0

# Upgrade kubelet and kubectl on each node
apt-get install -y kubelet=1.28.0-00 kubectl=1.28.0-00
systemctl restart kubelet

Cost Optimization

What practices would you propose to reduce compute costs across the org?

1. Auto Scaling

Horizontal Pod Autoscaler (HPA)
Cluster Autoscaler
Vertical Pod Autoscaler (VPA)

2. Reserved/Spot Instances

Reserved Instances for predictable workloads (save 30-70%)
Spot Instances for fault-tolerant workloads (save up to 90%)
Savings Plans for flexible commitments

3. Rightsizing

Analyze resource utilization
Right-size Pods and instances
Remove resource limits where appropriate
Use tools like Kubecost, Goldilocks

4. Serverless

AWS Lambda for event-driven workloads
Fargate for containerized workloads without managing nodes
Pay only for actual usage

5. Monitoring & Cleanup

Delete unused resources (old snapshots, unattached volumes)
Shut down dev/test environments off-hours
Use cloud provider cost explorer and recommendations
Implement tagging strategy for cost allocation

Security

What would you do if you accidentally pushed credentials to a remote repo?

Immediate Actions:

Revoke/rotate keys immediately
- AWS: Deactivate and delete access keys
- Generate new credentials

Clean Git history

# Using git-filter-repo (recommended)
git filter-repo --path credentials.json --invert-paths

# Or BFG Repo-Cleaner
bfg --delete-files credentials.json

Force push

git push origin --force --all
git push origin --force --tags

Prevention:
- Add .env, credentials.json to .gitignore
- Use git-secrets or pre-commit hooks
- Implement secret scanning in CI/CD (GitHub Advanced Security, GitGuardian)
- Use secret management (AWS Secrets Manager, HashiCorp Vault)

Assume credentials are compromised immediately. Check CloudTrail/audit logs for unauthorized access.

Docker & Containerization

What is Containerization?

Packaging applications with all their dependencies into isolated containers using a shared OS kernel.Benefits:

Consistent environments (dev, staging, prod)
Faster deployments
Resource efficiency (compared to VMs)
Isolation and security
Portability across platforms

Container vs VM:

Containers share OS kernel → lighter weight
VMs include full OS → heavier but more isolated

What is a Dockerfile?

A Dockerfile contains instructions to build a Docker image.Example:

FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
CMD ["npm", "start"]

Common Instructions:

FROM - Base image
WORKDIR - Set working directory
COPY - Copy files from host to image
RUN - Execute commands during build
EXPOSE - Document ports (doesn’t publish)
CMD - Default command when container starts
ENTRYPOINT - Configures container as executable

What is a Docker Network?

A way for containers to communicate securely with:

Other containers
The host system
External systems

Network Types:

bridge (default) - Private network for containers on same host
host - Container uses host’s network directly
overlay - Multi-host networking for Swarm
macvlan - Assign MAC address to container
none - Disable networking

Commands:

# Create network
docker network create my_network

# List networks
docker network ls

# Run container on network
docker run --network=my_network nginx

Resources

Kubernetes Core Concepts

Services & Networking

Troubleshooting

Scheduling & Affinity

Deployment Strategies

AWS & Cloud

Infrastructure as Code

Backup & Disaster Recovery

Cluster Management

Cost Optimization

Security

Docker & Containerization

Build docs developers (and LLMs) love

Resources

​Kubernetes Core Concepts

​Services & Networking

​Troubleshooting

​Scheduling & Affinity

​Deployment Strategies

​AWS & Cloud

​Infrastructure as Code

​Backup & Disaster Recovery

​Cluster Management

​Cost Optimization

​Security

​Docker & Containerization

Build docs developers (and LLMs) love

Kubernetes Core Concepts

Services & Networking

Troubleshooting

Scheduling & Affinity

Deployment Strategies

AWS & Cloud

Infrastructure as Code

Backup & Disaster Recovery

Cluster Management

Cost Optimization

Security

Docker & Containerization