mm/ with architecture-specific page table code in each arch/*/mm/ directory.
Physical memory model
Linux abstracts the diversity of physical memory layouts using one of two memory models selected at build time: FLATMEM and SPARSEMEM. Both track physical page frames usingstruct page objects arranged in arrays, maintaining a one-to-one mapping between a Page Frame Number (PFN) and its struct page.
- FLATMEM
- SPARSEMEM
FLATMEM is the simplest model, suited for non-NUMA systems with contiguous physical memory. A single global The
mem_map array covers the entire physical address space.ARCH_PFN_OFFSET accounts for systems whose physical memory starts at an address other than 0. Architecture setup code calls free_area_init() to allocate the array, which becomes usable after memblock_free_all() hands memory to the page allocator.ZONE_DEVICE builds on SPARSEMEM_VMEMMAP to provide struct page services for device-owned memory (persistent memory via DAX, GPU memory via HMM, peer-to-peer DMA via p2pdma) without ever marking those pages online.Memory zones
The kernel partitions physical memory into zones that reflect hardware constraints on which addresses certain operations can use.| Zone | Purpose |
|---|---|
ZONE_DMA | Memory accessible by legacy ISA DMA (typically first 16 MB on x86) |
ZONE_DMA32 | Memory below 4 GB, required by 32-bit-only DMA devices |
ZONE_NORMAL | Directly mapped kernel memory; the workhorse zone |
ZONE_HIGHMEM | Physical memory above the kernel’s direct mapping limit (32-bit only) |
ZONE_MOVABLE | Pages that can be migrated, enabling memory hot-remove |
ZONE_DEVICE | Device-managed memory (pmem, GPU) |
NR_FREE_PAGES and NR_INACTIVE_ANON.
Buddy allocator
The buddy allocator (mm/page_alloc.c) is the primary physical page allocator. It manages free memory in power-of-two blocks called orders (order 0 = 1 page, order 1 = 2 pages, …, order MAX_ORDER = 4096 pages on most configs).
SLUB allocator
The SLUB allocator (mm/slub.c) sits on top of the buddy allocator and provides efficient allocation of small, fixed-size kernel objects (e.g., struct task_struct, struct inode, network buffers). It replaced the original SLAB allocator as the default.
From the source (mm/slub.c):
Virtual memory and page tables
Each process has a private virtual address space described bystruct mm_struct. Within that space, contiguous regions are represented as Virtual Memory Areas (struct vm_area_struct, or VMA). VMAs record permissions, backing (anonymous, file-mapped, or device), and the associated file/offset if any.
Page tables translate virtual addresses to physical ones. Linux uses a multi-level page table hierarchy whose depth varies by architecture:
| Architecture | Levels | Typical depth |
|---|---|---|
| x86-64 (4-level) | PGD → P4D → PUD → PMD → PTE | 4 |
| x86-64 (5-level) | PGD → P4D → PUD → PMD → PTE | 5 |
| AArch64 | PGD → PUD → PMD → PTE | 3–4 |
| RISC-V (Sv48) | PGD → PUD → PMD → PTE | 4 |
vmalloc
vmalloc() allocates virtually contiguous but physically discontiguous memory. It is used when large contiguous physical allocations would fail due to fragmentation, but a logically contiguous virtual range is needed (e.g., for module text, large kernel buffers).
NUMA support
On Non-Uniform Memory Access (NUMA) systems, memory latency depends on the distance between a CPU and the memory bank being accessed. Linux models topology as a graph of nodes, each with its own zones and free lists. The NUMA-aware page allocator tries to satisfy allocations from the requesting CPU’s local node first.alloc_pages_node() lets callers specify a target node explicitly:
CONFIG_NUMA_BALANCING enables automatic NUMA balancing: the kernel periodically unmaps process pages, detects which NUMA node the process is actually accessing them from, and migrates pages to the closer node.
Huge pages and Transparent Huge Pages
Standard pages are 4 KB. Huge pages (2 MB on x86-64, 1 GB with 1 GB pages) reduce TLB pressure significantly for large working sets by replacing 512 page table entries with a single PMD-level mapping.- HugeTLBfs
- Transparent Huge Pages (THP)
Static huge pages reserved at boot or via
sysctl. Processes must explicitly use mmap(MAP_HUGETLB) or mount hugetlbfs.Memory reclaim and swapping
When free memory falls below watermarks, the kernel reclaims pages through the page reclaim path (mm/vmscan.c):
kswapd wakes up
Per-node
kswapd kernel threads wake when free pages drop below pages_low. They scan the LRU lists to find candidate pages to reclaim.Page aging via LRU lists
Pages move between inactive and active LRU lists. Frequently accessed pages get promoted to the active list; cold pages age toward the inactive list and become reclaim candidates.
Anonymous pages are swapped
Anonymous pages (heap, stack, mmap-anonymous) with no file backing must be written to the swap area before they can be freed. The swap subsystem (
mm/swap_state.c, mm/swapfile.c) manages swap space as a block device or file.OOM killer
When all reclaim efforts fail and the system has no free memory, the Out-Of-Memory (OOM) killer (mm/oom_kill.c) selects a process to kill, freeing its memory.
The OOM killer scores each process based on its memory footprint (oom_score) and adjusts using a per-process tunable:
