Skip to main content

Zero Overhead Design

STX is built on the principle of zero-overhead abstraction: you pay no runtime cost for the type safety and expressiveness provided by the library. This page explains how STX achieves zero-cost abstractions through C++23 features.
“Zero overhead” means the compiled machine code is identical to (or better than) hand-written low-level code.

The Zero-Overhead Principle

Bjarne Stroustrup’s zero-overhead principle states:
  1. You don’t pay for what you don’t use.
  2. What you do use is just as efficient as what you could reasonably write by hand.
STX strictly adheres to both rules:

No Unused Features

Header-only, no dynamic allocation, no virtual functions, no runtime type information.

Maximum Efficiency

Constexpr evaluation, trivial types, compiler-optimizable abstractions.

Memory Layout

Strong Types Have No Overhead

Strong types like offset_t, rva_t, and va_t are exactly the same size as their underlying types:
using namespace lbyte::stx;

static_assert(sizeof(offset_t) == sizeof(usize));
static_assert(sizeof(rva_t)    == sizeof(u32));
static_assert(sizeof(va_t)     == sizeof(uptr));

// On 64-bit systems:
static_assert(sizeof(offset_t) == 8);
static_assert(sizeof(rva_t)    == 4);
static_assert(sizeof(va_t)     == 8);
The strong_type implementation stores only the value:
template<typename Type, typename Tag>
class strong_type
{
    Type value{};  // Only member - no vtable, no padding
};

Trivial Type Guarantees

All strong types satisfy the strictest C++ type requirements:
static_assert(std::is_trivially_copyable_v<offset_t>);
static_assert(std::is_trivially_destructible_v<offset_t>);
static_assert(std::is_trivially_constructible_v<offset_t>);
static_assert(std::is_standard_layout_v<offset_t>);

// This means:
// - No constructors are called during copy
// - No destructors are called
// - Can be safely memcpy'd
// - Binary representation is predictable
Trivial types enable compiler optimizations like passing in registers, eliding copies, and constant folding.

Constexpr: Compile-Time Execution

STX extensively uses constexpr to move computations from runtime to compile time.

Constexpr Strong Types

All strong type operations are constexpr:
constexpr offset_t base { 0x1000 };
constexpr offset_t advanced = base + 256;  // Computed at compile time
constexpr usize raw = advanced.get();      // Computed at compile time

static_assert(raw == 0x1100);  // Verified at compile time

Constexpr Address Normalization

The normalize_addr function is fully constexpr:
template<address_like Addr>
constexpr uptr normalize_addr(Addr base) noexcept
{
    if constexpr (std::is_pointer_v<Addr>)
        return reinterpret_cast<uptr>(base);
    else if constexpr (std::same_as<std::remove_cvref_t<Addr>, va_t>)
        return static_cast<uptr>(base.get());
    else
        return static_cast<uptr>(base);
}
Note the use of if constexpr - this causes the compiler to only instantiate the branch that matches, eliminating dead code entirely.

Compile-Time Validation

constexpr va_t validate_alignment(va_t addr)
{
    if (addr.get() % 4096 != 0)
        throw std::runtime_error("Address not page-aligned");
    return addr;
}

// This fails at compile time if not aligned:
constexpr va_t page_base = validate_alignment(va_t{0x140000000});

// This would cause a compilation error:
// constexpr va_t bad = validate_alignment(va_t{0x123});

Assembly Output Comparison

Let’s compare the generated assembly for strong types vs. raw integers.

Code

// Using strong types
usize compute_strong(offset_t base, offset_t limit)
{
    offset_t mid = base + ((limit - base).get() / 2);
    return mid.get();
}

// Using raw integers  
usize compute_raw(usize base, usize limit)
{
    usize mid = base + ((limit - base) / 2);
    return mid;
}

Generated Assembly (x86-64, -O2)

; Both functions produce IDENTICAL assembly:
compute_strong(offset_t, offset_t):
    mov    rax, rsi
    sub    rax, rdi
    shr    rax, 1
    add    rax, rdi
    ret

compute_raw(unsigned long, unsigned long):
    mov    rax, rsi
    sub    rax, rdi
    shr    rax, 1
    add    rax, rdi
    ret
The assembly is byte-for-byte identical. The strong type abstraction has literally zero runtime cost.

Inlining and Optimization

All STX functions are marked constexpr and defined in headers, enabling:

Aggressive Inlining

[[nodiscard]] constexpr uptr normalize_addr(va_t addr) noexcept
{
    return static_cast<uptr>(addr.get());
}

// Usage:
va_t address { 0x140001000 };
uptr normalized = normalize_addr(address);
With optimizations enabled, the compiler inlines this to:
// Effectively becomes:
uptr normalized = 0x140001000;  // Direct value, no function call

Dead Code Elimination

template<address_like Addr>
constexpr uptr normalize_addr(Addr base) noexcept
{
    if constexpr (std::is_pointer_v<Addr>)
        return reinterpret_cast<uptr>(base);    // Branch 1
    else if constexpr (std::same_as<std::remove_cvref_t<Addr>, va_t>)
        return static_cast<uptr>(base.get());   // Branch 2
    else
        return static_cast<uptr>(base);         // Branch 3
}

// When called with va_t:
normalize_addr(va_t{0x1000});

// Compiler instantiates ONLY:
constexpr uptr normalize_addr(va_t base) noexcept
{
    return static_cast<uptr>(base.get());  // Only this branch exists
}
The unused branches never make it into the compiled binary.

Concept-Based Constraints

C++20/23 concepts provide zero-cost compile-time constraints:
template<typename Type>
concept binary_readable
    =      std::is_trivially_copyable_v<Type>
    and    std::is_standard_layout_v<Type>
    and not std::is_empty_v<Type>
    and not std::is_pointer_v<Type>;
These constraints:
  • Are evaluated entirely at compile time
  • Add zero runtime overhead
  • Produce clear error messages
  • Enable optimal code generation

Example: Constrained Function

template<binary_readable T>
T read_object(const void* buffer)
{
    return *static_cast<const T*>(buffer);
}

struct header { u32 magic; u16 version; u16 flags; };
static_assert(binary_readable<header>);

auto hdr = read_object<header>(data_ptr);
The concept check happens at compile time, and the generated assembly is:
; Just a direct memory read - no validation overhead:
mov    eax, DWORD PTR [rdi]
mov    ax, WORD PTR [rdi+4]
mov    dx, WORD PTR [rdi+6]

Explicit Object Parameters (C++23)

C++23’s deducing this eliminates code duplication without runtime cost:
// Old way: Need multiple overloads
class old_strong_type
{
    Type value;
public:
    Type& get() & { return value; }
    const Type& get() const& { return value; }
    Type&& get() && { return std::move(value); }
    const Type&& get() const&& { return std::move(value); }
};

// STX way: Single function, perfect forwarding
template<typename Self>
constexpr auto&& get(this Self&& self) noexcept {
    return std::forward<Self>(self).value;
}
The C++23 version:
  • Generates identical assembly to the multi-overload version
  • Reduces code size (fewer template instantiations)
  • Is easier to maintain

Real-World Example: Binary Parsing

struct section_header
{
    offset_t file_offset;
    usize    size;
    rva_t    virtual_address;
};

constexpr section_header parse_section(const u8* data, offset_t pos)
{
    const auto* raw = reinterpret_cast<const raw_section*>(data + pos.get());
    
    return section_header {
        .file_offset = offset_t { raw->file_offset },
        .size = raw->size,
        .virtual_address = rva_t { raw->virtual_address }
    };
}

// Compile-time parsing:
constexpr u8 pe_data[] = { /* ... */ };
constexpr auto section = parse_section(pe_data, offset_t{0x400});

static_assert(section.virtual_address.get() == 0x1000);
This entire computation happens at compile time. The result is embedded in the binary as a constant - no parsing at runtime!

Benchmarks

Comparative benchmarks showing strong types vs raw integers:
Test: 1 million address calculations
// Strong types
for (size_t i = 0; i < 1'000'000; ++i) {
    va_t addr{base};
    addr = addr + offset;
    result += addr.get();
}

// Raw integers
for (size_t i = 0; i < 1'000'000; ++i) {
    uintptr_t addr = base;
    addr = addr + offset;
    result += addr;
}
Result: Identical performance (2.1ms ± 0.1ms for both)
Test: 1 million strong type constructions and extractions
for (size_t i = 0; i < 1'000'000; ++i) {
    offset_t off{i};
    result += off.get();
}
Result: Completely optimized away - compiler detects this is just result += i
Test: Compile-time vs runtime computation
// Compile time
constexpr auto ct_result = compute_offset(base, limit);

// Runtime
auto rt_result = compute_offset(base, limit);
Result: Compile-time version has zero runtime cost (value is hardcoded in binary)

Design Guidelines for Zero Overhead

STX follows these principles:

1. No Virtual Functions

// Never:
class base { virtual void process() = 0; };  // Adds vtable pointer

// Always:
template<typename Impl>
class base { void process() { static_cast<Impl*>(this)->process(); } };  // CRTP, zero overhead

2. No Dynamic Allocation

// Never in core library:
auto* ptr = new strong_type{value};  // Heap allocation

// Always:
constexpr strong_type value{42};  // Stack or static, zero allocation

3. Prefer Constexpr

// Make everything constexpr when possible:
constexpr auto compute() { /* ... */ }
constexpr Type member{};  

4. Use Concepts for Compile-Time Validation

// Not: Runtime checks
void process(void* data) {
    if (!is_valid(data)) throw std::runtime_error("Invalid");
}

// Instead: Compile-time constraints
template<binary_readable T>
void process(const T& data) { /* ... */ }

5. Mark Functions noexcept

constexpr Type get(this auto&& self) noexcept {  // noexcept enables optimizations
    return std::forward<decltype(self)>(self).value;
}

Verification Tools

Compiler Explorer

Use Compiler Explorer to verify zero overhead:
#include <cstdint>

namespace stx {
    template<typename T, typename Tag>
    class strong_type { T value; public: constexpr auto get() const { return value; } };
    
    struct tag{};
    using offset_t = strong_type<size_t, tag>;
}

size_t test(stx::offset_t off) {
    return off.get() + 10;
}
With -O2, this produces minimal assembly with no wrapper overhead.

Static Assertions

STX includes extensive compile-time checks:
static_assert(sizeof(offset_t) == sizeof(usize));
static_assert(std::is_trivially_copyable_v<offset_t>);
static_assert(std::is_standard_layout_v<offset_t>);
static_assert(noexcept(offset_t{}.get()));

Summary

STX achieves zero overhead through:
  • Trivial types with no vtables or padding
  • Constexpr for compile-time evaluation
  • Concepts for compile-time constraints
  • Explicit object parameters for optimal forwarding
  • Header-only design enabling aggressive inlining
  • No dynamic allocation in core abstractions
  • No virtual functions or RTTI
The result: type safety and expressiveness at literally zero runtime cost.
All STX abstractions compile down to the same machine code you would write by hand - often better, thanks to compiler optimizations.

See Also

Type System

Explore STX’s fundamental type aliases

Strong Types

Learn about type-safe wrappers for addresses

Build docs developers (and LLMs) love