Effective kernel hardening is defense-in-depth: no single mechanism is sufficient on its own. The goal is to make exploitation difficult at every layer—from memory layout to system call access to mandatory access control.
Kernel self-protection
Kernel self-protection focuses on eliminating classes of exploitable bugs and blocking exploitation techniques even when bugs exist. The design goal is that protections should be on by default, require no opt-in by developers, have minimal performance impact, and be testable.KASLR
Randomizes the kernel’s load address at boot, making it harder for attackers to locate kernel code and data.
KPTI
Kernel Page Table Isolation. Separates kernel and user page tables to mitigate Meltdown-class attacks.
Stack protector
Places canary values before return addresses to detect stack buffer overflows before the function returns.
FORTIFY_SOURCE
Compile-time and runtime checks on buffer operations (memcpy, strcpy, etc.) to catch overflows.
Memory permissions
The kernel enforces strict permissions on its own memory regions:CONFIG_STRICT_KERNEL_RWX: Ensures kernel code is not writable, and kernel data is not executable.CONFIG_STRICT_MODULE_RWX: Same enforcement for loaded kernel modules.CONFIG_RODATA_FULL_DEFAULT_ENABLED: Marks as read-only all data that is set at__inittime.
__ro_after_init, making them read-only for the rest of the kernel’s lifetime.
Function pointers and sensitive variables are moved to .rodata sections where possible, removing them as writable attack targets.
Kernel Address Space Layout Randomization (KASLR)
KASLR randomizes the physical and virtual base addresses of the kernel text, modules, and heap at each boot:- Text and module base:
CONFIG_RANDOMIZE_BASEmoves the kernel’s load address, frustrating attacks that rely on fixed addresses. - Stack base: Stack base randomization varies the starting address across processes and system calls.
- Dynamic memory: Heap regions (kmalloc, vmalloc) base addresses are randomized, requiring an information leak to target them reliably.
- Structure layout:
CONFIG_RANDSTRUCTrandomizes the layout of sensitive kernel structures per build, requiring per-build information leaks.
Stack protection
Stack canaries (CONFIG_STACKPROTECTOR, CONFIG_STACKPROTECTOR_STRONG): A secret value is placed between local variables and the saved return address. If a stack overflow overwrites the canary, the kernel detects the corruption before returning from the function.
CONFIG_STACKPROTECTOR_STRONG applies canaries to any function that has arrays, pointers to stack memory, or any address-taken local variables—significantly broader coverage than the baseline.
Stack depth overflow protection (CONFIG_VMAP_STACK): Maps the kernel stack using virtual memory with a fault-triggering guard page at the bottom. Deep recursion or large stack allocations that overflow the stack boundary cause an immediate fault rather than silently corrupting adjacent memory.
Heap integrity
The kernel’s slab allocators (SLUB, SLAB_KASAN) support sanity checks on free lists during allocation and deallocation, detecting use-after-free and double-free bugs before they can be exploited.
Kernel Address Sanitizer (KASAN) (CONFIG_KASAN): A compile-time instrumentation-based heap memory error detector. Not suitable for production use, but invaluable for catching bugs in development and CI.
Memory initialization and poisoning
CONFIG_INIT_ON_ALLOC_DEFAULT_ON: Zero-initializes heap allocations, preventing information leaks from uninitialized memory.CONFIG_INIT_ON_FREE_DEFAULT_ON: Zeros freed heap memory, preventing use-after-free content exposure.CONFIG_KSTACK_ERASE: Clears the kernel stack on syscall return, preventing stack content exposure across privilege boundaries.
FORTIFY_SOURCE
CONFIG_FORTIFY_SOURCE adds compile-time and runtime checks to common C library functions (memcpy, memset, strcpy, sprintf, etc.). When buffer sizes are statically known, the compiler inserts bounds checks. Buffer overflows are caught before they can be exploited.
Information leak prevention
Leaking kernel addresses to userspace is a common prerequisite for exploits. Several controls limit this.kptr_restrict
| Value | Effect |
|---|---|
| 0 | Addresses hashed/randomized before printing |
| 1 | Addresses hidden from unprivileged; raw for CAP_SYSLOG |
| 2 | Always print 0 regardless of privilege |
dmesg_restrict
perf_event_paranoid
perf subsystem can expose detailed CPU performance counters and timing information. Restricting it limits the data available for side-channel attacks.
Linux Security Modules (LSM)
The Linux Security Module framework hooks into kernel operations and allows security policies to be enforced at the kernel level. Capabilities are always active; one “major” LSM (SELinux, AppArmor, Smack, or TOMOYO) can be active at a time, potentially alongside minor modules (Yama, Landlock).Selecting an LSM
SELinux
SELinux
SELinux (Security-Enhanced Linux) implements Mandatory Access Control (MAC) using type enforcement policies. Every process and file has a security label, and access is granted only if the policy explicitly permits it.SELinux operates in three modes:
SELinux is configured primarily by setting
| Mode | Description |
|---|---|
| Enforcing | Policy is enforced; violations are denied and logged |
| Permissive | Policy violations are logged but not denied (useful for policy development) |
| Disabled | SELinux is not active |
SELINUX=enforcing (or permissive) in /etc/selinux/config and selecting a policy type (targeted, mls, minimum).AppArmor
AppArmor
AppArmor implements MAC using per-application profiles that define what files, capabilities, and network resources each program may access. Applications without a profile run unconfined.Enable AppArmor at boot:
Yama (ptrace restriction)
Yama (ptrace restriction)
Yama is a minor LSM that restricts ptrace-based process introspection. This limits the ability of one process to attach to another via debuggers.
| Value | Scope |
|---|---|
| 0 | No restrictions (classic ptrace behavior) |
| 1 | Restricted: a process can only ptrace its descendants (default) |
| 2 | Admin-only: only processes with CAP_SYS_PTRACE can use ptrace |
| 3 | No ptrace at all; cannot be changed at runtime |
Landlock
Landlock
Landlock is an unprivileged sandboxing mechanism. Unlike SELinux and AppArmor, it does not require system-wide policy configuration. Instead, processes opt in and restrict their own access to filesystem paths using the
landlock_create_ruleset(2) and landlock_add_rule(2) system calls.Landlock is designed for use by applications that want to confine themselves without requiring root or LSM policy configuration. It is available from Linux 5.13.Seccomp — system call filtering
Seccomp (secure computing mode) limits the system calls a process may make. It is the foundation for sandboxing in containers, browsers, and secure applications.Strict mode
Strict mode (SECCOMP_SET_MODE_STRICT) allows only four system calls: read(), write(), exit(), and sigreturn(). Any other call kills the process.
Filter mode (BPF)
Filter mode uses BPF programs to define a per-process syscall policy. Each syscall is evaluated against the BPF filter; the filter can allow, deny, kill, or return an errno.Namespace isolation
Linux namespaces isolate processes’ views of system resources. Each namespace type wraps a different resource:| Namespace | Flag | Isolates |
|---|---|---|
| Mount | CLONE_NEWNS | Filesystem mount points |
| UTS | CLONE_NEWUTS | Hostname and NIS domain name |
| IPC | CLONE_NEWIPC | SysV IPC, POSIX message queues |
| PID | CLONE_NEWPID | Process IDs |
| Network | CLONE_NEWNET | Network devices, IPs, routing, firewall |
| User | CLONE_NEWUSER | User and group IDs |
| Cgroup | CLONE_NEWCGROUP | Cgroup root directory |
| Time | CLONE_NEWTIME | Boot and monotonic clock offsets |
Restricting user namespaces
Unprivileged user namespaces can be used to escalate privileges through vulnerabilities in namespace-aware code. Restricting them reduces attack surface:Secure boot and module signing
Secure Boot is a UEFI feature that verifies the bootloader and kernel are signed by a trusted key before executing them. When combined with kernel module signing, it provides a complete chain of trust from firmware through bootloader to running kernel code.UEFI Secure Boot
The UEFI firmware verifies the bootloader (GRUB, systemd-boot) against keys in the Secure Boot database (db). Most distributions ship bootloaders signed by Microsoft or their own CA.
Kernel signing
The bootloader verifies the kernel image signature before loading it. The kernel binary must be signed with a key trusted by Secure Boot.
Hardware mitigations: Spectre and Meltdown
Spectre and Meltdown are CPU microarchitecture vulnerabilities that allow unprivileged code to read memory it should not have access to. The Linux kernel includes mitigations for all known variants.Meltdown (Variant 3) — KPTI
Kernel Page Table Isolation (KPTI) maintains separate page tables for user and kernel mode. When running in userspace, the kernel’s memory is almost entirely unmapped, so a CPU bug cannot read kernel memory through speculative execution.Spectre Variant 1 — bounds check bypass
Mitigated by addingarray_index_nospec() barriers at dangerous speculation points in the kernel, preventing speculative out-of-bounds array accesses.
Spectre Variant 2 — branch target injection
Retpoline (CONFIG_RETPOLINE): Replaces indirect branches with a construct that does not speculate, preventing attackers from poisoning the branch target buffer to redirect speculation.
Microcode updates for IBRS (Indirect Branch Restricted Speculation) and eIBRS provide hardware-level isolation on newer Intel processors.
Checking mitigation status
Controlling mitigations via kernel parameters
Security-relevant sysctl summary
Kernel lockdown mode
Kernel lockdown (CONFIG_SECURITY_LOCKDOWN_LSM) restricts root’s ability to modify the running kernel. It is automatically enabled when Secure Boot is active on many distributions.
| Mode | Restrictions |
|---|---|
| none | No restrictions |
| integrity | Prevents modifications that could compromise kernel integrity (e.g., writing to /dev/mem, loading unsigned modules) |
| confidentiality | Adds restrictions on reading kernel memory (e.g., via kcore, kmem, perf) |
