Skip to main content
The Linux kernel includes multiple layers of security features. Some are compile-time protections built into the kernel binary; others are runtime configurations you control via sysctl, LSM policies, or kernel parameters.
Effective kernel hardening is defense-in-depth: no single mechanism is sufficient on its own. The goal is to make exploitation difficult at every layer—from memory layout to system call access to mandatory access control.

Kernel self-protection

Kernel self-protection focuses on eliminating classes of exploitable bugs and blocking exploitation techniques even when bugs exist. The design goal is that protections should be on by default, require no opt-in by developers, have minimal performance impact, and be testable.

KASLR

Randomizes the kernel’s load address at boot, making it harder for attackers to locate kernel code and data.

KPTI

Kernel Page Table Isolation. Separates kernel and user page tables to mitigate Meltdown-class attacks.

Stack protector

Places canary values before return addresses to detect stack buffer overflows before the function returns.

FORTIFY_SOURCE

Compile-time and runtime checks on buffer operations (memcpy, strcpy, etc.) to catch overflows.

Memory permissions

The kernel enforces strict permissions on its own memory regions:
  • CONFIG_STRICT_KERNEL_RWX: Ensures kernel code is not writable, and kernel data is not executable.
  • CONFIG_STRICT_MODULE_RWX: Same enforcement for loaded kernel modules.
  • CONFIG_RODATA_FULL_DEFAULT_ENABLED: Marks as read-only all data that is set at __init time.
Variables that only need to be written once at initialization can be marked __ro_after_init, making them read-only for the rest of the kernel’s lifetime. Function pointers and sensitive variables are moved to .rodata sections where possible, removing them as writable attack targets.

Kernel Address Space Layout Randomization (KASLR)

KASLR randomizes the physical and virtual base addresses of the kernel text, modules, and heap at each boot:
  • Text and module base: CONFIG_RANDOMIZE_BASE moves the kernel’s load address, frustrating attacks that rely on fixed addresses.
  • Stack base: Stack base randomization varies the starting address across processes and system calls.
  • Dynamic memory: Heap regions (kmalloc, vmalloc) base addresses are randomized, requiring an information leak to target them reliably.
  • Structure layout: CONFIG_RANDSTRUCT randomizes the layout of sensitive kernel structures per build, requiring per-build information leaks.
# Enable KASLR via kernel parameter (it is on by default when compiled in)
# To disable for debugging:
nokaslr

Stack protection

Stack canaries (CONFIG_STACKPROTECTOR, CONFIG_STACKPROTECTOR_STRONG): A secret value is placed between local variables and the saved return address. If a stack overflow overwrites the canary, the kernel detects the corruption before returning from the function. CONFIG_STACKPROTECTOR_STRONG applies canaries to any function that has arrays, pointers to stack memory, or any address-taken local variables—significantly broader coverage than the baseline. Stack depth overflow protection (CONFIG_VMAP_STACK): Maps the kernel stack using virtual memory with a fault-triggering guard page at the bottom. Deep recursion or large stack allocations that overflow the stack boundary cause an immediate fault rather than silently corrupting adjacent memory.

Heap integrity

The kernel’s slab allocators (SLUB, SLAB_KASAN) support sanity checks on free lists during allocation and deallocation, detecting use-after-free and double-free bugs before they can be exploited. Kernel Address Sanitizer (KASAN) (CONFIG_KASAN): A compile-time instrumentation-based heap memory error detector. Not suitable for production use, but invaluable for catching bugs in development and CI.

Memory initialization and poisoning

  • CONFIG_INIT_ON_ALLOC_DEFAULT_ON: Zero-initializes heap allocations, preventing information leaks from uninitialized memory.
  • CONFIG_INIT_ON_FREE_DEFAULT_ON: Zeros freed heap memory, preventing use-after-free content exposure.
  • CONFIG_KSTACK_ERASE: Clears the kernel stack on syscall return, preventing stack content exposure across privilege boundaries.

FORTIFY_SOURCE

CONFIG_FORTIFY_SOURCE adds compile-time and runtime checks to common C library functions (memcpy, memset, strcpy, sprintf, etc.). When buffer sizes are statically known, the compiler inserts bounds checks. Buffer overflows are caught before they can be exploited.

Information leak prevention

Leaking kernel addresses to userspace is a common prerequisite for exploits. Several controls limit this.

kptr_restrict

# Hide kernel addresses from unprivileged users
sysctl -w kernel.kptr_restrict=2
ValueEffect
0Addresses hashed/randomized before printing
1Addresses hidden from unprivileged; raw for CAP_SYSLOG
2Always print 0 regardless of privilege

dmesg_restrict

# Prevent unprivileged users from reading kernel log
sysctl -w kernel.dmesg_restrict=1
Kernel log messages can reveal memory addresses, hardware details, and error conditions useful to attackers.

perf_event_paranoid

# Restrict perf event access to privileged users
sysctl -w kernel.perf_event_paranoid=2
The perf subsystem can expose detailed CPU performance counters and timing information. Restricting it limits the data available for side-channel attacks.

Linux Security Modules (LSM)

The Linux Security Module framework hooks into kernel operations and allows security policies to be enforced at the kernel level. Capabilities are always active; one “major” LSM (SELinux, AppArmor, Smack, or TOMOYO) can be active at a time, potentially alongside minor modules (Yama, Landlock).
# See which LSMs are active
cat /sys/kernel/security/lsm
# capability,yama,selinux

Selecting an LSM

# At boot (kernel parameter)
security=selinux
security=apparmor

# At compile time
CONFIG_DEFAULT_SECURITY_SELINUX=y
CONFIG_DEFAULT_SECURITY_APPARMOR=y
SELinux (Security-Enhanced Linux) implements Mandatory Access Control (MAC) using type enforcement policies. Every process and file has a security label, and access is granted only if the policy explicitly permits it.SELinux operates in three modes:
ModeDescription
EnforcingPolicy is enforced; violations are denied and logged
PermissivePolicy violations are logged but not denied (useful for policy development)
DisabledSELinux is not active
# Check current mode
getenforce
sestatus

# Temporarily switch to permissive mode
setenforce 0

# View process labels
ps -Z

# View file labels
ls -Z /etc/passwd

# Audit log entries for SELinux denials
ausearch -m avc -ts recent
SELinux is configured primarily by setting SELINUX=enforcing (or permissive) in /etc/selinux/config and selecting a policy type (targeted, mls, minimum).
AppArmor implements MAC using per-application profiles that define what files, capabilities, and network resources each program may access. Applications without a profile run unconfined.
# Check AppArmor status
aa-status

# View loaded profiles
ls /etc/apparmor.d/

# Load a profile
apparmor_parser -r /etc/apparmor.d/usr.bin.firefox

# Put a profile in complain mode (log only, no denial)
aa-complain /etc/apparmor.d/usr.bin.firefox

# Enforce a profile
aa-enforce /etc/apparmor.d/usr.bin.firefox
Enable AppArmor at boot:
# Kernel parameter
security=apparmor
# or, if it is the default LSM:
apparmor=1
Yama is a minor LSM that restricts ptrace-based process introspection. This limits the ability of one process to attach to another via debuggers.
# Check current ptrace scope
cat /proc/sys/kernel/yama/ptrace_scope
ValueScope
0No restrictions (classic ptrace behavior)
1Restricted: a process can only ptrace its descendants (default)
2Admin-only: only processes with CAP_SYS_PTRACE can use ptrace
3No ptrace at all; cannot be changed at runtime
# Restrict ptrace to descendants only
sysctl -w kernel.yama.ptrace_scope=1
Landlock is an unprivileged sandboxing mechanism. Unlike SELinux and AppArmor, it does not require system-wide policy configuration. Instead, processes opt in and restrict their own access to filesystem paths using the landlock_create_ruleset(2) and landlock_add_rule(2) system calls.Landlock is designed for use by applications that want to confine themselves without requiring root or LSM policy configuration. It is available from Linux 5.13.
# Check if Landlock is enabled
cat /sys/kernel/security/lsm   # should contain "landlock"

Seccomp — system call filtering

Seccomp (secure computing mode) limits the system calls a process may make. It is the foundation for sandboxing in containers, browsers, and secure applications.

Strict mode

Strict mode (SECCOMP_SET_MODE_STRICT) allows only four system calls: read(), write(), exit(), and sigreturn(). Any other call kills the process.

Filter mode (BPF)

Filter mode uses BPF programs to define a per-process syscall policy. Each syscall is evaluated against the BPF filter; the filter can allow, deny, kill, or return an errno.
/* Example: allow only read, write, exit, exit_group */
struct sock_filter filter[] = {
    /* Load syscall number */
    BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
             offsetof(struct seccomp_data, nr)),
    /* Permit read */
    BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_read, 0, 1),
    BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
    /* Permit write */
    BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_write, 0, 1),
    BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
    /* Permit exit */
    BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_exit_group, 0, 1),
    BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
    /* Deny everything else */
    BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL),
};
Containerization tools like Docker, Podman, and systemd use seccomp profiles to restrict containers and services to only the syscalls they need.
# Apply a seccomp profile with Docker
docker run --security-opt seccomp=/path/to/profile.json myimage

# Check if a process has seccomp active
grep Seccomp /proc/$(pidof myprocess)/status
# Seccomp:        2   (2 = filter mode)

Namespace isolation

Linux namespaces isolate processes’ views of system resources. Each namespace type wraps a different resource:
NamespaceFlagIsolates
MountCLONE_NEWNSFilesystem mount points
UTSCLONE_NEWUTSHostname and NIS domain name
IPCCLONE_NEWIPCSysV IPC, POSIX message queues
PIDCLONE_NEWPIDProcess IDs
NetworkCLONE_NEWNETNetwork devices, IPs, routing, firewall
UserCLONE_NEWUSERUser and group IDs
CgroupCLONE_NEWCGROUPCgroup root directory
TimeCLONE_NEWTIMEBoot and monotonic clock offsets
Containers combine all of these namespaces to create isolated environments that appear to processes as separate systems.

Restricting user namespaces

Unprivileged user namespaces can be used to escalate privileges through vulnerabilities in namespace-aware code. Restricting them reduces attack surface:
# Disable unprivileged user namespace creation (breaks some container tools)
sysctl -w kernel.unprivileged_userns_clone=0    # Debian/Ubuntu
sysctl -w user.max_user_namespaces=0            # standard kernel

Secure boot and module signing

Secure Boot is a UEFI feature that verifies the bootloader and kernel are signed by a trusted key before executing them. When combined with kernel module signing, it provides a complete chain of trust from firmware through bootloader to running kernel code.
1

UEFI Secure Boot

The UEFI firmware verifies the bootloader (GRUB, systemd-boot) against keys in the Secure Boot database (db). Most distributions ship bootloaders signed by Microsoft or their own CA.
# Check if Secure Boot is active
mokutil --sb-state
# or:
cat /sys/firmware/efi/efivars/SecureBoot-*/
2

Kernel signing

The bootloader verifies the kernel image signature before loading it. The kernel binary must be signed with a key trusted by Secure Boot.
3

Module signing

With CONFIG_MODULE_SIG_FORCE=y (or module.sig_enforce=1), only modules signed by a key in the kernel’s built-in keyring load successfully. This closes the last gap: even with a verified kernel, an attacker with root cannot load unsigned kernel code.
# Check if module signature enforcement is active
cat /sys/module/module/parameters/sig_enforce
If you compile a custom kernel or out-of-tree modules and use Secure Boot, you must enroll your signing key into the MOK (Machine Owner Key) database using mokutil. Otherwise your modules will be rejected.
# Enroll your key into MOK
mokutil --import my_signing_key.der
# Reboot and confirm the enrollment in the MOK manager

Hardware mitigations: Spectre and Meltdown

Spectre and Meltdown are CPU microarchitecture vulnerabilities that allow unprivileged code to read memory it should not have access to. The Linux kernel includes mitigations for all known variants.

Meltdown (Variant 3) — KPTI

Kernel Page Table Isolation (KPTI) maintains separate page tables for user and kernel mode. When running in userspace, the kernel’s memory is almost entirely unmapped, so a CPU bug cannot read kernel memory through speculative execution.
# Check if KPTI is active
dmesg | grep -i kpti
# or:
cat /sys/devices/system/cpu/vulnerabilities/meltdown

Spectre Variant 1 — bounds check bypass

Mitigated by adding array_index_nospec() barriers at dangerous speculation points in the kernel, preventing speculative out-of-bounds array accesses.

Spectre Variant 2 — branch target injection

Retpoline (CONFIG_RETPOLINE): Replaces indirect branches with a construct that does not speculate, preventing attackers from poisoning the branch target buffer to redirect speculation. Microcode updates for IBRS (Indirect Branch Restricted Speculation) and eIBRS provide hardware-level isolation on newer Intel processors.

Checking mitigation status

# View vulnerability and mitigation status for all known issues
grep -r '' /sys/devices/system/cpu/vulnerabilities/

# Example output:
# /sys/devices/system/cpu/vulnerabilities/meltdown:Not affected
# /sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: usercopy/swapgs barriers and __user pointer sanitization
# /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Retpolines; IBPB: conditional; IBRS_FW; STIBP: always-on; RSB filling; PBRSB-eIBRS: SW sequence; BHI: BHI_DIS_S
# /sys/devices/system/cpu/vulnerabilities/srbds:Not affected
# /sys/devices/system/cpu/vulnerabilities/mmio_stale_data:Not affected

Controlling mitigations via kernel parameters

Disabling hardware mitigations significantly increases the risk of cross-process and cross-VM information leaks. Only disable them on isolated, single-tenant systems where you understand the risks and have measured a meaningful performance impact.
# Disable all mitigations (maximum performance, maximum risk)
mitigations=off

# Disable specific mitigations
nopti              # disable KPTI (Meltdown mitigation)
nospectre_v2       # disable Spectre v2 mitigations
nospectre_v1       # disable Spectre v1 mitigations

Security-relevant sysctl summary

# /etc/sysctl.d/99-security-hardening.conf

# --- Kernel pointer and address exposure ---
kernel.kptr_restrict = 2
kernel.dmesg_restrict = 1

# --- Performance counter access ---
kernel.perf_event_paranoid = 2

# --- ASLR ---
kernel.randomize_va_space = 2

# --- ptrace scope (requires Yama LSM) ---
kernel.yama.ptrace_scope = 1

# --- Core dump restrictions ---
fs.suid_dumpable = 0

# --- Network: SYN flood and IP spoofing protection ---
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1

# --- Disable ICMP redirect acceptance ---
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0

# --- Disable source routing ---
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0

# --- Log martian packets ---
net.ipv4.conf.all.log_martians = 1

Kernel lockdown mode

Kernel lockdown (CONFIG_SECURITY_LOCKDOWN_LSM) restricts root’s ability to modify the running kernel. It is automatically enabled when Secure Boot is active on many distributions.
# Check lockdown state
cat /sys/kernel/security/lockdown
# [none] integrity confidentiality
ModeRestrictions
noneNo restrictions
integrityPrevents modifications that could compromise kernel integrity (e.g., writing to /dev/mem, loading unsigned modules)
confidentialityAdds restrictions on reading kernel memory (e.g., via kcore, kmem, perf)
# Enable integrity lockdown at boot
lockdown=integrity

# Enable full lockdown
lockdown=confidentiality
Once set, the lockdown level can only be reduced by rebooting. It cannot be downgraded at runtime.

Build docs developers (and LLMs) love