Filesystems

Linux supports dozens of filesystem implementations through the Virtual File System (VFS) layer. The VFS defines a common set of data structures and operations that every filesystem must implement. User-space programs interact exclusively with the VFS through system calls such as open(2), read(2), and write(2), remaining unaware of which underlying filesystem serves their files. The VFS source lives in fs/, with per-filesystem implementations in subdirectories such as fs/ext4/, fs/btrfs/, fs/xfs/, and fs/proc/.

VFS abstraction

The VFS is described in Documentation/filesystems/vfs.rst:

The Virtual File System (also known as the Virtual Filesystem Switch) is the software layer in the kernel that provides the filesystem interface to userspace programs. It also provides an abstraction within the kernel which allows different filesystem implementations to coexist.

Four primary objects implement the VFS abstraction:

struct super_block

Represents a mounted filesystem instance. Holds filesystem-wide metadata (block size, flags, root dentry) and a pointer to super_operations which implement sync_fs, statfs, evict_inode, etc.

struct inode

Represents a filesystem object (file, directory, symlink, device node). Lives on disk for persistent filesystems or in memory for virtual ones. Pointed to by dentries.

struct dentry

A directory entry — the mapping from a filename component to an inode. The dentry cache (dcache) holds recently used dentries in memory for fast path resolution. Dentries are never written to disk.

struct file

The kernel-side representation of an open file descriptor. Created on open(2) and destroyed on the final close(2). Points to a dentry and holds the current file offset and open flags.

Key operations structures

Each VFS object type has an associated operations structure that filesystems populate to implement their behaviour.

inode_operations
file_operations
super_operations

Controls how inodes are looked up and manipulated:

struct inode_operations {
    struct dentry *(*lookup)(struct inode *, struct dentry *, unsigned int);
    int            (*create)(struct mnt_idmap *, struct inode *,
                             struct dentry *, umode_t, bool);
    int            (*link)(struct dentry *, struct inode *, struct dentry *);
    int            (*unlink)(struct inode *, struct dentry *);
    int            (*symlink)(struct mnt_idmap *, struct inode *,
                              struct dentry *, const char *);
    int            (*mkdir)(struct mnt_idmap *, struct inode *,
                            struct dentry *, umode_t);
    int            (*rmdir)(struct inode *, struct dentry *);
    int            (*rename)(struct mnt_idmap *, struct inode *,
                             struct dentry *, struct inode *,
                             struct dentry *, unsigned int);
    int            (*getattr)(struct mnt_idmap *, const struct path *,
                              struct kstat *, u32, unsigned int);
    int            (*setattr)(struct mnt_idmap *, struct dentry *,
                              struct iattr *);
};

Controls how open file descriptors are used:

struct file_operations {
    loff_t  (*llseek)(struct file *, loff_t, int);
    ssize_t (*read)(struct file *, char __user *, size_t, loff_t *);
    ssize_t (*write)(struct file *, const char __user *, size_t, loff_t *);
    ssize_t (*read_iter)(struct kiocb *, struct iov_iter *);
    ssize_t (*write_iter)(struct kiocb *, struct iov_iter *);
    int     (*mmap)(struct file *, struct vm_area_struct *);
    int     (*open)(struct inode *, struct file *);
    int     (*flush)(struct file *, fl_owner_t id);
    int     (*release)(struct inode *, struct file *);
    int     (*fsync)(struct file *, loff_t, loff_t, int datasync);
    long    (*unlocked_ioctl)(struct file *, unsigned int, unsigned long);
};

Controls filesystem-wide operations:

struct super_operations {
    struct inode *(*alloc_inode)(struct super_block *sb);
    void          (*destroy_inode)(struct inode *);
    void          (*dirty_inode)(struct inode *, int flags);
    int           (*write_inode)(struct inode *, struct writeback_control *);
    void          (*evict_inode)(struct inode *);
    void          (*put_super)(struct super_block *);
    int           (*sync_fs)(struct super_block *, int wait);
    int           (*statfs)(struct dentry *, struct kstatfs *);
};

Filesystem mounting

Mounting connects a filesystem instance to a point in the directory hierarchy. The VFS mount API (fs/namespace.c) was substantially revised in Linux 5.2 to introduce a more flexible fsopen/fsmount/move_mount interface alongside the classic mount(2) syscall.

/* Filesystem registration */
#include <linux/fs.h>

extern int register_filesystem(struct file_system_type *);
extern int unregister_filesystem(struct file_system_type *);

/* A filesystem type descriptor */
struct file_system_type {
    const char *name;           /* e.g. "ext4", "tmpfs" */
    int fs_flags;
    int (*init_fs_context)(struct fs_context *);
    const struct fs_parameter_spec *parameters;
    struct super_block *(*mount)(struct file_system_type *, int,
                                  const char *, void *);
    void (*kill_sb)(struct super_block *);
};

# Mount an ext4 filesystem
mount -t ext4 /dev/sda1 /mnt/data

# Mount with options
mount -t ext4 -o noatime,data=ordered /dev/sda1 /mnt/data

# List all mounted filesystems
cat /proc/mounts
findmnt --tree

Filesystem lookup

The kernel looks up the file_system_type by name in the global filesystem list.

Superblock creation

mount() calls the filesystem’s mount() or init_fs_context() method, which reads on-disk superblock data and populates struct super_block.

Root dentry

The filesystem returns a root dentry. The kernel attaches it to the mount point in the current namespace’s mount tree.

Path resolution

Subsequent path lookups traverse dentries starting from the mount root, calling inode_operations.lookup() at each component to descend into the tree.

Major filesystem implementations

ext4

The most widely deployed Linux filesystem. ext4 evolved from ext2/ext3, adding extents (replacing block maps), delayed allocation, journalling (via JBD2), online defragmentation, and large volume/file support.

Journal modes: journal (safest), ordered (default), writeback (fastest)
Extents: a contiguous range of blocks described by (start_block, length), replacing per-block indirect maps
Checksums: metadata checksums protect journal, bitmaps, inodes, and directories

# Create an ext4 filesystem
mkfs.ext4 /dev/sdb1

# Tune the journal mode
tune2fs -o journal_data_ordered /dev/sdb1

# Inspect filesystem metadata
dumpe2fs /dev/sdb1 | head -40

btrfs

A copy-on-write (CoW) B-tree filesystem with built-in RAID, snapshots, checksums, and online filesystem growth and balance operations.

CoW semantics: writes never overwrite existing data; old blocks remain until all references are gone
Subvolumes: independently mountable filesystem sub-trees
Snapshots: writable or read-only CoW clones of a subvolume, O(1) creation
Checksums: all data and metadata are checksummed (CRC32C, xxHash, SHA256, Blake2b)
RAID: built-in RAID 0/1/10/5/6 without dm-raid

mkfs.btrfs /dev/sdb1
# Create a read-only snapshot
btrfs subvolume snapshot -r /mnt/data /mnt/data-snap
# Show filesystem usage
btrfs filesystem df /mnt

XFS

A high-performance, scalable filesystem originally developed by SGI. XFS excels at large files and parallel I/O workloads.

Allocation groups: independent parallel allocation units for scalable concurrency
Extent-based: maps logical ranges to physical extents
Journalling: write-ahead log for metadata; supports external log devices
Delayed allocation: batches allocation decisions to reduce fragmentation
Online repair: xfs_repair can repair a live mounted filesystem

mkfs.xfs /dev/sdb1
xfs_info /mnt
xfs_admin -L "mydata" /dev/sdb1

Virtual and pseudo filesystems

tmpfs and ramfs

tmpfs and ramfs are in-memory filesystems. ramfs has no size limit and no swap backing. tmpfs enforces a configurable size limit and can page contents to swap.

# Mount a tmpfs of up to 512 MB
mount -t tmpfs -o size=512m tmpfs /tmp

# tmpfs supports NUMA placement
mount -t tmpfs -o size=1g,mpol=interleave tmpfs /mnt/tmpfs

tmpfs powers shm_open(3) and memfd_create(2) for shared memory, and is the backing store for /dev/shm.

proc filesystem

procfs (fs/proc/) exports kernel data structures as a filesystem tree under /proc. It is a pseudo-filesystem — no data is ever written to disk.

/proc/cpuinfo          # CPU topology and features
/proc/meminfo          # memory usage breakdown
/proc/net/tcp          # TCP socket table
/proc/<pid>/maps       # virtual memory layout of a process
/proc/<pid>/status     # process status and resource usage
/proc/<pid>/fd/        # open file descriptors
/proc/sys/             # sysctl tunables (read/write)

sysfs

sysfs (fs/sysfs/) exposes the kernel’s device model under /sys. Every bus, device, driver, and class has a directory. Attributes are files that represent individual properties.

/sys/bus/pci/devices/  # PCI devices by BDF address
/sys/class/net/        # network interfaces
/sys/block/sda/        # block device attributes
/sys/kernel/mm/        # memory management tunables
/sys/fs/cgroup/        # cgroup hierarchy root

sysfs attributes use a one-value-per-file convention. Scripts that need to tune kernel parameters should use sysctl(8) for /proc/sys/ knobs and direct file writes for /sys/ attributes.

FUSE

FUSE (Filesystem in Userspace, fs/fuse/) allows filesystem implementations to run entirely in user space. The kernel FUSE driver forwards VFS operations to a user-space daemon via a /dev/fuse character device.

VFS call (e.g. read)
  └─ fs/fuse/file.c:fuse_read()
       └─ encode FUSE_READ request → /dev/fuse
            └─ user-space daemon reads request
            └─ daemon performs I/O
            └─ daemon writes FUSE_READ reply → /dev/fuse
       └─ kernel copies data to user buffer

FUSE is used by:

SSHFS — mount remote filesystems over SSH
GVfs — GNOME virtual filesystem (FTP, MTP, Google Drive)
libfuse-based implementations (EncFS, CephFS FUSE client, etc.)

# Install libfuse and mount SSHFS
sshfs user@host:/remote/path /mnt/remote

# Unmount a FUSE filesystem
fusermount -u /mnt/remote

FUSE filesystems involve a context switch for every VFS operation (kernel → user daemon → kernel). This makes them significantly slower than in-kernel filesystems for latency-sensitive or high-IOPS workloads. Minimise round-trips by enabling FUSE_CAP_ASYNC_READ, FUSE_CAP_WRITEBACK_CACHE, and similar capability flags in the daemon.

Directory Entry Cache

The dentry cache (dcache) (fs/dcache.c) is a central performance component of the VFS. It caches recently resolved path components as struct dentry objects in a hash table, avoiding repeated lookup() calls into the filesystem.

# Inspect dentry cache stats
cat /proc/sys/fs/dentry-state
# output: nr_dentries nr_unused nr_negative ...

# Drop the dentry and inode caches (use with caution)
echo 2 > /proc/sys/vm/drop_caches

Negative dentries cache the fact that a name does not exist, making repeated stat() calls for missing files cheap. Under memory pressure, the kernel reclaims dentries starting from the least-recently-used end of the LRU.

Get Started

Kernel Internals

Development Guide

Administration

Driver Development

VFS abstraction

struct super_block

struct inode

struct dentry

struct file

Key operations structures

Filesystem mounting

Major filesystem implementations

Virtual and pseudo filesystems

tmpfs and ramfs

proc filesystem

sysfs

FUSE

Directory Entry Cache

Build docs developers (and LLMs) love

Get Started

Kernel Internals

Development Guide

Administration

Driver Development

​VFS abstraction

struct super_block

struct inode

struct dentry

struct file

​Key operations structures

​Filesystem mounting

​Major filesystem implementations

​Virtual and pseudo filesystems

​tmpfs and ramfs

​proc filesystem

​sysfs

​FUSE

​Directory Entry Cache

Build docs developers (and LLMs) love

VFS abstraction

Key operations structures

Filesystem mounting

Major filesystem implementations

Virtual and pseudo filesystems

tmpfs and ramfs

proc filesystem

sysfs

FUSE

Directory Entry Cache