Skip to main content
Linux supports dozens of filesystem implementations through the Virtual File System (VFS) layer. The VFS defines a common set of data structures and operations that every filesystem must implement. User-space programs interact exclusively with the VFS through system calls such as open(2), read(2), and write(2), remaining unaware of which underlying filesystem serves their files. The VFS source lives in fs/, with per-filesystem implementations in subdirectories such as fs/ext4/, fs/btrfs/, fs/xfs/, and fs/proc/.

VFS abstraction

The VFS is described in Documentation/filesystems/vfs.rst:
The Virtual File System (also known as the Virtual Filesystem Switch) is the software layer in the kernel that provides the filesystem interface to userspace programs. It also provides an abstraction within the kernel which allows different filesystem implementations to coexist.
Four primary objects implement the VFS abstraction:

struct super_block

Represents a mounted filesystem instance. Holds filesystem-wide metadata (block size, flags, root dentry) and a pointer to super_operations which implement sync_fs, statfs, evict_inode, etc.

struct inode

Represents a filesystem object (file, directory, symlink, device node). Lives on disk for persistent filesystems or in memory for virtual ones. Pointed to by dentries.

struct dentry

A directory entry — the mapping from a filename component to an inode. The dentry cache (dcache) holds recently used dentries in memory for fast path resolution. Dentries are never written to disk.

struct file

The kernel-side representation of an open file descriptor. Created on open(2) and destroyed on the final close(2). Points to a dentry and holds the current file offset and open flags.

Key operations structures

Each VFS object type has an associated operations structure that filesystems populate to implement their behaviour.
Controls how inodes are looked up and manipulated:
struct inode_operations {
    struct dentry *(*lookup)(struct inode *, struct dentry *, unsigned int);
    int            (*create)(struct mnt_idmap *, struct inode *,
                             struct dentry *, umode_t, bool);
    int            (*link)(struct dentry *, struct inode *, struct dentry *);
    int            (*unlink)(struct inode *, struct dentry *);
    int            (*symlink)(struct mnt_idmap *, struct inode *,
                              struct dentry *, const char *);
    int            (*mkdir)(struct mnt_idmap *, struct inode *,
                            struct dentry *, umode_t);
    int            (*rmdir)(struct inode *, struct dentry *);
    int            (*rename)(struct mnt_idmap *, struct inode *,
                             struct dentry *, struct inode *,
                             struct dentry *, unsigned int);
    int            (*getattr)(struct mnt_idmap *, const struct path *,
                              struct kstat *, u32, unsigned int);
    int            (*setattr)(struct mnt_idmap *, struct dentry *,
                              struct iattr *);
};

Filesystem mounting

Mounting connects a filesystem instance to a point in the directory hierarchy. The VFS mount API (fs/namespace.c) was substantially revised in Linux 5.2 to introduce a more flexible fsopen/fsmount/move_mount interface alongside the classic mount(2) syscall.
/* Filesystem registration */
#include <linux/fs.h>

extern int register_filesystem(struct file_system_type *);
extern int unregister_filesystem(struct file_system_type *);

/* A filesystem type descriptor */
struct file_system_type {
    const char *name;           /* e.g. "ext4", "tmpfs" */
    int fs_flags;
    int (*init_fs_context)(struct fs_context *);
    const struct fs_parameter_spec *parameters;
    struct super_block *(*mount)(struct file_system_type *, int,
                                  const char *, void *);
    void (*kill_sb)(struct super_block *);
};
# Mount an ext4 filesystem
mount -t ext4 /dev/sda1 /mnt/data

# Mount with options
mount -t ext4 -o noatime,data=ordered /dev/sda1 /mnt/data

# List all mounted filesystems
cat /proc/mounts
findmnt --tree
1

Filesystem lookup

The kernel looks up the file_system_type by name in the global filesystem list.
2

Superblock creation

mount() calls the filesystem’s mount() or init_fs_context() method, which reads on-disk superblock data and populates struct super_block.
3

Root dentry

The filesystem returns a root dentry. The kernel attaches it to the mount point in the current namespace’s mount tree.
4

Path resolution

Subsequent path lookups traverse dentries starting from the mount root, calling inode_operations.lookup() at each component to descend into the tree.

Major filesystem implementations

The most widely deployed Linux filesystem. ext4 evolved from ext2/ext3, adding extents (replacing block maps), delayed allocation, journalling (via JBD2), online defragmentation, and large volume/file support.
  • Journal modes: journal (safest), ordered (default), writeback (fastest)
  • Extents: a contiguous range of blocks described by (start_block, length), replacing per-block indirect maps
  • Checksums: metadata checksums protect journal, bitmaps, inodes, and directories
# Create an ext4 filesystem
mkfs.ext4 /dev/sdb1

# Tune the journal mode
tune2fs -o journal_data_ordered /dev/sdb1

# Inspect filesystem metadata
dumpe2fs /dev/sdb1 | head -40
A copy-on-write (CoW) B-tree filesystem with built-in RAID, snapshots, checksums, and online filesystem growth and balance operations.
  • CoW semantics: writes never overwrite existing data; old blocks remain until all references are gone
  • Subvolumes: independently mountable filesystem sub-trees
  • Snapshots: writable or read-only CoW clones of a subvolume, O(1) creation
  • Checksums: all data and metadata are checksummed (CRC32C, xxHash, SHA256, Blake2b)
  • RAID: built-in RAID 0/1/10/5/6 without dm-raid
mkfs.btrfs /dev/sdb1
# Create a read-only snapshot
btrfs subvolume snapshot -r /mnt/data /mnt/data-snap
# Show filesystem usage
btrfs filesystem df /mnt
A high-performance, scalable filesystem originally developed by SGI. XFS excels at large files and parallel I/O workloads.
  • Allocation groups: independent parallel allocation units for scalable concurrency
  • Extent-based: maps logical ranges to physical extents
  • Journalling: write-ahead log for metadata; supports external log devices
  • Delayed allocation: batches allocation decisions to reduce fragmentation
  • Online repair: xfs_repair can repair a live mounted filesystem
mkfs.xfs /dev/sdb1
xfs_info /mnt
xfs_admin -L "mydata" /dev/sdb1

Virtual and pseudo filesystems

tmpfs and ramfs

tmpfs and ramfs are in-memory filesystems. ramfs has no size limit and no swap backing. tmpfs enforces a configurable size limit and can page contents to swap.
# Mount a tmpfs of up to 512 MB
mount -t tmpfs -o size=512m tmpfs /tmp

# tmpfs supports NUMA placement
mount -t tmpfs -o size=1g,mpol=interleave tmpfs /mnt/tmpfs
tmpfs powers shm_open(3) and memfd_create(2) for shared memory, and is the backing store for /dev/shm.

proc filesystem

procfs (fs/proc/) exports kernel data structures as a filesystem tree under /proc. It is a pseudo-filesystem — no data is ever written to disk.
/proc/cpuinfo          # CPU topology and features
/proc/meminfo          # memory usage breakdown
/proc/net/tcp          # TCP socket table
/proc/<pid>/maps       # virtual memory layout of a process
/proc/<pid>/status     # process status and resource usage
/proc/<pid>/fd/        # open file descriptors
/proc/sys/             # sysctl tunables (read/write)

sysfs

sysfs (fs/sysfs/) exposes the kernel’s device model under /sys. Every bus, device, driver, and class has a directory. Attributes are files that represent individual properties.
/sys/bus/pci/devices/  # PCI devices by BDF address
/sys/class/net/        # network interfaces
/sys/block/sda/        # block device attributes
/sys/kernel/mm/        # memory management tunables
/sys/fs/cgroup/        # cgroup hierarchy root
sysfs attributes use a one-value-per-file convention. Scripts that need to tune kernel parameters should use sysctl(8) for /proc/sys/ knobs and direct file writes for /sys/ attributes.

FUSE

FUSE (Filesystem in Userspace, fs/fuse/) allows filesystem implementations to run entirely in user space. The kernel FUSE driver forwards VFS operations to a user-space daemon via a /dev/fuse character device.
VFS call (e.g. read)
  └─ fs/fuse/file.c:fuse_read()
       └─ encode FUSE_READ request → /dev/fuse
            └─ user-space daemon reads request
            └─ daemon performs I/O
            └─ daemon writes FUSE_READ reply → /dev/fuse
       └─ kernel copies data to user buffer
FUSE is used by:
  • SSHFS — mount remote filesystems over SSH
  • GVfs — GNOME virtual filesystem (FTP, MTP, Google Drive)
  • libfuse-based implementations (EncFS, CephFS FUSE client, etc.)
# Install libfuse and mount SSHFS
sshfs user@host:/remote/path /mnt/remote

# Unmount a FUSE filesystem
fusermount -u /mnt/remote
FUSE filesystems involve a context switch for every VFS operation (kernel → user daemon → kernel). This makes them significantly slower than in-kernel filesystems for latency-sensitive or high-IOPS workloads. Minimise round-trips by enabling FUSE_CAP_ASYNC_READ, FUSE_CAP_WRITEBACK_CACHE, and similar capability flags in the daemon.

Directory Entry Cache

The dentry cache (dcache) (fs/dcache.c) is a central performance component of the VFS. It caches recently resolved path components as struct dentry objects in a hash table, avoiding repeated lookup() calls into the filesystem.
# Inspect dentry cache stats
cat /proc/sys/fs/dentry-state
# output: nr_dentries nr_unused nr_negative ...

# Drop the dentry and inode caches (use with caution)
echo 2 > /proc/sys/vm/drop_caches
Negative dentries cache the fact that a name does not exist, making repeated stat() calls for missing files cheap. Under memory pressure, the kernel reclaims dentries starting from the least-recently-used end of the LRU.

Build docs developers (and LLMs) love