Pod Scheduling
Kubernetes provides multiple mechanisms to control where Pods are scheduled in your cluster. This allows you to optimize resource usage, ensure high availability, and meet specific deployment requirements.Node Selector
The simplest way to constrain Pods to specific nodes is usingnodeSelector. First, label your nodes:
The Pod will only be scheduled on nodes with the
sku=small label.Node Affinity
Node affinity provides more expressive rules for Pod placement compared to nodeSelector.Required Node Affinity
- Required
- Preferred
requiredDuringSchedulingIgnoredDuringExecution: The Pod must be placed on a node matching the criteria. If no node matches, the Pod remains pending.Use case: Ensuring Pods only run on nodes with specific hardware (e.g., GPU nodes).Affinity Operators
In: Label value must be in the listNotIn: Label value must not be in the listExists: Label key must exist (value doesn’t matter)DoesNotExist: Label key must not existGt: Label value must be greater than specified valueLt: Label value must be less than specified value
Pod Affinity and Anti-Affinity
Pod affinity rules allow you to schedule Pods based on labels of other Pods already running on nodes.Pod Anti-Affinity Example
Understanding topologyKey
Understanding topologyKey
The
topologyKey defines the scope of the affinity rule. Common values:kubernetes.io/hostname: Pods must/must not run on the same nodetopology.kubernetes.io/zone: Pods must/must not run in the same zonetopology.kubernetes.io/region: Pods must/must not run in the same region
Taints and Tolerations
Taints allow nodes to repel certain Pods, while tolerations allow Pods to schedule onto nodes with matching taints.Adding Taints to Nodes
Pod with Toleration
Taint Effects
Pod Priority and Preemption
Priority classes allow you to define the importance of Pods relative to others.Creating a Priority Class
Using Priority Class in Pod
Priority Value Ranges:
- User-defined: 0 to 1,000,000,000
system-cluster-critical: 2,000,000,000 (used by coredns, calico)system-node-critical: 2,000,001,000 (used by etcd, kube-apiserver)
Preemption Policies
PreemptLowerPriority: Evicts lower priority Pods to make roomNever: Places Pod ahead in queue but doesn’t evict others
Removing Labels and Taints
Best Practices
- Use nodeSelector for simple constraints
- Use node affinity for complex rules with multiple conditions
- Use pod anti-affinity to spread replicas across nodes for high availability
- Reserve taints for special-purpose nodes (GPU, high-memory, etc.)
- Set priority classes for critical workloads to ensure they’re scheduled first
- Combine multiple scheduling constraints carefully to avoid Pods that can’t be scheduled