Process & Thread Scheduling

The SerenityOS kernel implements a priority-based preemptive multitasking scheduler that manages processes and threads. The scheduling subsystem is located in Kernel/Tasks/ and provides fair CPU time distribution across multiple threads.

Core Concepts

Process

A Process (Kernel/Tasks/Process.h) represents a program in execution with its own:

Address space (AddressSpace)
Open file descriptors
Credentials (UID, GID, groups)
Security context (pledge promises, unveil paths)
One or more threads
Process group and session membership

Processes are identified by a unique ProcessID (pid) and have a parent process (ppid).

Key Process State

class Process : public ListedRefCounted<Process, LockType::Spinlock> {
    ProcessID pid;
    ProcessID ppid;
    RefPtr<Credentials> credentials;
    RefPtr<ProcessGroup> process_group;
    RefPtr<TTY> tty;
    u32 promises;           // pledge promises
    u32 execpromises;       // exec pledge promises
    mode_t umask;
    VirtualAddress signal_trampoline;
    u32 thread_count;
};

Thread

A Thread (Kernel/Tasks/Thread.h) is the schedulable unit of execution. Each thread has:

Unique thread ID (ThreadID)
Reference to parent process
Priority level (1-31)
CPU affinity mask
Register state
Kernel and user stacks
Thread-local storage (TLS)

Thread States

enum class State : u8 {
    Invalid = 0,
    Runnable,     // Ready to run
    Running,      // Currently executing
    Dying,        // Exiting
    Dead,         // Fully terminated
    Stopped,      // Stopped by signal (SIGSTOP)
    Blocked,      // Waiting for resource
};

Only threads in the Runnable state are eligible for scheduling.

The Scheduler

The Scheduler (Kernel/Tasks/Scheduler.h) is responsible for:

Selecting the next thread to run
Context switching between threads
Managing runnable thread queues
Timer-based preemption
Idle loop execution

Scheduling Algorithm

SerenityOS uses a priority-based round-robin scheduler with multiple priority queues:

static constexpr size_t priority_queue_count = 32;
Array<ThreadReadyQueue, priority_queue_count> queues;

Thread priorities range from THREAD_PRIORITY_MIN (1) to THREAD_PRIORITY_MAX (31):

Higher values = higher priority
Default priority: 30 (normal)
Idle threads: priority 1

Priority Queue Implementation

The scheduler maintains a bitmask of non-empty queues for efficient queue selection:

struct ThreadReadyQueues {
    u32 mask {};  // Bitmask of non-empty queues
    Array<ThreadReadyQueue, count> queues;
};

When selecting the next thread, the scheduler uses bit_scan_forward() to find the highest-priority non-empty queue in O(1) time.

Time Slicing

Threads are given time slices based on their type:

static u32 time_slice_for(Thread const& thread)
{
    // One time slice unit == 4ms (assuming 250 ticks/second)
    if (thread.is_idle_thread())
        return 1;  // 4ms
    return 2;      // 8ms
}

When a thread’s time slice expires, the scheduler preempts it via timer interrupt.

Scheduling Operations

pick_next()

Selects the next thread to run:

Checks for runnable threads in priority order
Respects CPU affinity masks
Skips threads already running on other cores
Returns NoRunnableThreadFound if all threads blocked

yield()

Allows a thread to voluntarily give up the CPU:

static ScheduleResult yield();

The current thread is moved to the end of its priority queue and another thread is scheduled.

context_switch()

Performs the actual CPU context switch to a new thread:

Saves current thread’s register state
Switches page directory (address space)
Loads new thread’s register state
Updates TLS and stack pointers
Restores execution

The scheduler lock (g_scheduler_lock) must be held during context switches to prevent race conditions.

Thread Management

Creating Threads

Threads are created via Thread::create():

auto thread = TRY(Thread::create(process));
thread->set_priority(priority);
thread->set_name(name);

User space creates threads via the create_thread syscall with parameters:

struct SC_create_thread_params {
    unsigned int detach_state;      // JOINABLE or DETACHED
    int schedule_priority;          // Thread priority
    unsigned int guard_page_size;   // Stack guard size
    unsigned int stack_size;        // Stack size
    void* stack_location;           // Stack location (or nullptr)
    void* (*entry)(void*);          // Entry point
    void* entry_argument;           // Argument to entry
    void* tls_pointer;              // TLS pointer
};

Thread Blocking

Threads block when waiting for resources using the BlockResult mechanism:

enum class BlockResult::Type {
    WokeNormally,           // Unblocked normally
    NotBlocked,             // Wasn't actually blocked
    InterruptedBySignal,    // Signal received
    InterruptedByDeath,     // Process dying
    InterruptedByTimeout,   // Timeout expired
};

Common blocking scenarios:

WaitQueue: Waiting for events (e.g., child process exit)
Mutex: Waiting to acquire a lock
I/O: Waiting for data from files/sockets
Futex: User space synchronization primitives

Thread Finalization

When threads exit, they transition through Dying to Dead state. The finalizer thread (g_finalizer) performs cleanup:

Thread* g_finalizer;
WaitQueue* g_finalizer_wait_queue;

The finalizer:

Frees thread kernel stacks
Releases thread resources
Notifies joining threads
Removes thread from process

Process Management

Process Creation

Processes are created via:

fork(): Duplicate current process (copy-on-write)
exec(): Replace process with new program
posix_spawn(): Combined fork+exec optimization

Process Groups and Sessions

Processes are organized into: Process Groups (ProcessGroup)

Collection of related processes
Share a process group ID (pgid)
Used for signal delivery to multiple processes

Sessions

Collection of process groups
Associated with controlling terminal
Managed via setsid(), getsid()

Process Security

Pledge

Restricts process capabilities via promises:

enum class Pledge : u32 {
    stdio, rpath, wpath, cpath, dpath,
    inet, id, proc, ptrace, exec,
    unix, recvfd, sendfd, fattr, tty,
    chown, thread, video, accept,
    settime, sigaction, setkeymap,
    prot_exec, map_fixed, getkeymap,
    mount, unshare, no_error
};

Once pledged, violations cause process termination.

Unveil

Restricts filesystem access to specific paths:

enum class VeilState {
    None,              // No unveil restrictions
    Dropped,           // Unveil active, can add paths
    Locked,            // No more paths can be added
    LockedInherited,   // Inherited locked state
};

Pledge and unveil provide defense-in-depth security by limiting process capabilities after initialization.

CPU Affinity

Threads can be bound to specific CPUs via affinity masks:

#define THREAD_AFFINITY_DEFAULT 0xffffffff  // All CPUs

u32 affinity = 1u << cpu_id;  // Bind to specific CPU
thread->set_affinity(affinity);

The scheduler respects affinity when selecting threads:

auto affinity_mask = 1u << Processor::current_id();
if (!(thread.affinity() & affinity_mask))
    continue;  // Skip thread not allowed on this CPU

Performance Tracking

The scheduler tracks CPU time usage:

struct TotalTimeScheduled {
    u64 total { 0 };        // Total time scheduled
    u64 total_kernel { 0 }; // Time in kernel mode
};

Per-thread statistics include:

Time in user mode
Time in kernel mode
Context switches
Page faults

Work Queues

WorkQueue (Kernel/Tasks/WorkQueue.h) provides deferred work execution:

WorkQueue::global().queue_work([&] {
    // Execute work asynchronously
});

Work queues run at lower priority and don’t block critical paths.

Key Operations

Scheduling a Thread

// Make thread runnable
Scheduler::enqueue_runnable_thread(*thread);

// Yield to scheduler
Scheduler::yield();

Setting Thread Priority

thread->set_priority(THREAD_PRIORITY_HIGH);

Blocking and Unblocking

// Block on wait queue
auto result = wait_queue.wait_on(timeout);
if (result.was_interrupted())
    return EINTR;

Kernel/Tasks/Scheduler.{h,cpp} - Core scheduler implementation
Kernel/Tasks/Thread.{h,cpp} - Thread abstraction
Kernel/Tasks/Process.{h,cpp} - Process management
Kernel/Tasks/ProcessGroup.{h,cpp} - Process group management
Kernel/Tasks/WaitQueue.{h,cpp} - Thread blocking mechanism
Kernel/Tasks/WorkQueue.{h,cpp} - Deferred work execution

Overview

Core Systems

Subsystems

Process & Thread Scheduling

Core Concepts

Process

Key Process State

Thread

Thread States

The Scheduler

Scheduling Algorithm

Time Slicing

Scheduling Operations

pick_next()

yield()

context_switch()

Thread Management

Creating Threads

Thread Blocking

Thread Finalization

Process Management

Process Creation

Process Groups and Sessions

Process Security

Pledge

Unveil

CPU Affinity

Performance Tracking

Work Queues

Key Operations

Scheduling a Thread

Setting Thread Priority

Blocking and Unblocking

Build docs developers (and LLMs) love

Overview

Core Systems

Subsystems

​Core Concepts

​Process

​Key Process State

​Thread

​Thread States

​The Scheduler

​Scheduling Algorithm

​Time Slicing

​Scheduling Operations

​pick_next()

​yield()

​context_switch()

​Thread Management

​Creating Threads

​Thread Blocking

​Thread Finalization

​Process Management

​Process Creation

​Process Groups and Sessions

​Process Security

​Pledge

​Unveil

​CPU Affinity

​Performance Tracking

​Work Queues

​Key Operations

​Scheduling a Thread

​Setting Thread Priority

​Blocking and Unblocking

​Related Files

Build docs developers (and LLMs) love

Core Concepts

Process

Key Process State

Thread

Thread States

The Scheduler

Scheduling Algorithm

Time Slicing

Scheduling Operations

pick_next()

yield()

context_switch()

Thread Management

Creating Threads

Thread Blocking

Thread Finalization

Process Management

Process Creation

Process Groups and Sessions

Process Security

Pledge

Unveil

CPU Affinity

Performance Tracking

Work Queues

Key Operations

Scheduling a Thread

Setting Thread Priority

Blocking and Unblocking

Related Files