Zero Overhead Design

STX is built on the principle of zero-overhead abstraction: you pay no runtime cost for the type safety and expressiveness provided by the library. This page explains how STX achieves zero-cost abstractions through C++23 features.

“Zero overhead” means the compiled machine code is identical to (or better than) hand-written low-level code.

The Zero-Overhead Principle

Bjarne Stroustrup’s zero-overhead principle states:

You don’t pay for what you don’t use.

What you do use is just as efficient as what you could reasonably write by hand.

STX strictly adheres to both rules:

No Unused Features

Header-only, no dynamic allocation, no virtual functions, no runtime type information.

Maximum Efficiency

Constexpr evaluation, trivial types, compiler-optimizable abstractions.

Memory Layout

Strong Types Have No Overhead

Strong types like offset_t, rva_t, and va_t are exactly the same size as their underlying types:

using namespace lbyte::stx;

static_assert(sizeof(offset_t) == sizeof(usize));
static_assert(sizeof(rva_t)    == sizeof(u32));
static_assert(sizeof(va_t)     == sizeof(uptr));

// On 64-bit systems:
static_assert(sizeof(offset_t) == 8);
static_assert(sizeof(rva_t)    == 4);
static_assert(sizeof(va_t)     == 8);

The strong_type implementation stores only the value:

template<typename Type, typename Tag>
class strong_type
{
    Type value{};  // Only member - no vtable, no padding
};

Trivial Type Guarantees

All strong types satisfy the strictest C++ type requirements:

static_assert(std::is_trivially_copyable_v<offset_t>);
static_assert(std::is_trivially_destructible_v<offset_t>);
static_assert(std::is_trivially_constructible_v<offset_t>);
static_assert(std::is_standard_layout_v<offset_t>);

// This means:
// - No constructors are called during copy
// - No destructors are called
// - Can be safely memcpy'd
// - Binary representation is predictable

Trivial types enable compiler optimizations like passing in registers, eliding copies, and constant folding.

Constexpr: Compile-Time Execution

STX extensively uses constexpr to move computations from runtime to compile time.

Constexpr Strong Types

All strong type operations are constexpr:

constexpr offset_t base { 0x1000 };
constexpr offset_t advanced = base + 256;  // Computed at compile time
constexpr usize raw = advanced.get();      // Computed at compile time

static_assert(raw == 0x1100);  // Verified at compile time

Constexpr Address Normalization

The normalize_addr function is fully constexpr:

template<address_like Addr>
constexpr uptr normalize_addr(Addr base) noexcept
{
    if constexpr (std::is_pointer_v<Addr>)
        return reinterpret_cast<uptr>(base);
    else if constexpr (std::same_as<std::remove_cvref_t<Addr>, va_t>)
        return static_cast<uptr>(base.get());
    else
        return static_cast<uptr>(base);
}

Note the use of if constexpr - this causes the compiler to only instantiate the branch that matches, eliminating dead code entirely.

Compile-Time Validation

constexpr va_t validate_alignment(va_t addr)
{
    if (addr.get() % 4096 != 0)
        throw std::runtime_error("Address not page-aligned");
    return addr;
}

// This fails at compile time if not aligned:
constexpr va_t page_base = validate_alignment(va_t{0x140000000});

// This would cause a compilation error:
// constexpr va_t bad = validate_alignment(va_t{0x123});

Assembly Output Comparison

Let’s compare the generated assembly for strong types vs. raw integers.

Code

// Using strong types
usize compute_strong(offset_t base, offset_t limit)
{
    offset_t mid = base + ((limit - base).get() / 2);
    return mid.get();
}

// Using raw integers  
usize compute_raw(usize base, usize limit)
{
    usize mid = base + ((limit - base) / 2);
    return mid;
}

Generated Assembly (x86-64, -O2)

; Both functions produce IDENTICAL assembly:
compute_strong(offset_t, offset_t):
    mov    rax, rsi
    sub    rax, rdi
    shr    rax, 1
    add    rax, rdi
    ret

compute_raw(unsigned long, unsigned long):
    mov    rax, rsi
    sub    rax, rdi
    shr    rax, 1
    add    rax, rdi
    ret

The assembly is byte-for-byte identical. The strong type abstraction has literally zero runtime cost.

Inlining and Optimization

All STX functions are marked constexpr and defined in headers, enabling:

Aggressive Inlining

[[nodiscard]] constexpr uptr normalize_addr(va_t addr) noexcept
{
    return static_cast<uptr>(addr.get());
}

// Usage:
va_t address { 0x140001000 };
uptr normalized = normalize_addr(address);

With optimizations enabled, the compiler inlines this to:

// Effectively becomes:
uptr normalized = 0x140001000;  // Direct value, no function call

Dead Code Elimination

template<address_like Addr>
constexpr uptr normalize_addr(Addr base) noexcept
{
    if constexpr (std::is_pointer_v<Addr>)
        return reinterpret_cast<uptr>(base);    // Branch 1
    else if constexpr (std::same_as<std::remove_cvref_t<Addr>, va_t>)
        return static_cast<uptr>(base.get());   // Branch 2
    else
        return static_cast<uptr>(base);         // Branch 3
}

// When called with va_t:
normalize_addr(va_t{0x1000});

// Compiler instantiates ONLY:
constexpr uptr normalize_addr(va_t base) noexcept
{
    return static_cast<uptr>(base.get());  // Only this branch exists
}

The unused branches never make it into the compiled binary.

Concept-Based Constraints

C++20/23 concepts provide zero-cost compile-time constraints:

template<typename Type>
concept binary_readable
    =      std::is_trivially_copyable_v<Type>
    and    std::is_standard_layout_v<Type>
    and not std::is_empty_v<Type>
    and not std::is_pointer_v<Type>;

These constraints:

Are evaluated entirely at compile time
Add zero runtime overhead
Produce clear error messages
Enable optimal code generation

Example: Constrained Function

template<binary_readable T>
T read_object(const void* buffer)
{
    return *static_cast<const T*>(buffer);
}

struct header { u32 magic; u16 version; u16 flags; };
static_assert(binary_readable<header>);

auto hdr = read_object<header>(data_ptr);

The concept check happens at compile time, and the generated assembly is:

; Just a direct memory read - no validation overhead:
mov    eax, DWORD PTR [rdi]
mov    ax, WORD PTR [rdi+4]
mov    dx, WORD PTR [rdi+6]

Explicit Object Parameters (C++23)

C++23’s deducing this eliminates code duplication without runtime cost:

// Old way: Need multiple overloads
class old_strong_type
{
    Type value;
public:
    Type& get() & { return value; }
    const Type& get() const& { return value; }
    Type&& get() && { return std::move(value); }
    const Type&& get() const&& { return std::move(value); }
};

// STX way: Single function, perfect forwarding
template<typename Self>
constexpr auto&& get(this Self&& self) noexcept {
    return std::forward<Self>(self).value;
}

The C++23 version:

Generates identical assembly to the multi-overload version
Reduces code size (fewer template instantiations)
Is easier to maintain

Real-World Example: Binary Parsing

struct section_header
{
    offset_t file_offset;
    usize    size;
    rva_t    virtual_address;
};

constexpr section_header parse_section(const u8* data, offset_t pos)
{
    const auto* raw = reinterpret_cast<const raw_section*>(data + pos.get());
    
    return section_header {
        .file_offset = offset_t { raw->file_offset },
        .size = raw->size,
        .virtual_address = rva_t { raw->virtual_address }
    };
}

// Compile-time parsing:
constexpr u8 pe_data[] = { /* ... */ };
constexpr auto section = parse_section(pe_data, offset_t{0x400});

static_assert(section.virtual_address.get() == 0x1000);

This entire computation happens at compile time. The result is embedded in the binary as a constant - no parsing at runtime!

Benchmarks

Comparative benchmarks showing strong types vs raw integers:

Address Arithmetic

Test: 1 million address calculations

// Strong types
for (size_t i = 0; i < 1'000'000; ++i) {
    va_t addr{base};
    addr = addr + offset;
    result += addr.get();
}

// Raw integers
for (size_t i = 0; i < 1'000'000; ++i) {
    uintptr_t addr = base;
    addr = addr + offset;
    result += addr;
}

Result: Identical performance (2.1ms ± 0.1ms for both)

Type Conversions

Test: 1 million strong type constructions and extractions

for (size_t i = 0; i < 1'000'000; ++i) {
    offset_t off{i};
    result += off.get();
}

Result: Completely optimized away - compiler detects this is just result += i

Constexpr Evaluation

Test: Compile-time vs runtime computation

// Compile time
constexpr auto ct_result = compute_offset(base, limit);

// Runtime
auto rt_result = compute_offset(base, limit);

Result: Compile-time version has zero runtime cost (value is hardcoded in binary)

Design Guidelines for Zero Overhead

STX follows these principles:

1. No Virtual Functions

// Never:
class base { virtual void process() = 0; };  // Adds vtable pointer

// Always:
template<typename Impl>
class base { void process() { static_cast<Impl*>(this)->process(); } };  // CRTP, zero overhead

2. No Dynamic Allocation

// Never in core library:
auto* ptr = new strong_type{value};  // Heap allocation

// Always:
constexpr strong_type value{42};  // Stack or static, zero allocation

3. Prefer Constexpr

// Make everything constexpr when possible:
constexpr auto compute() { /* ... */ }
constexpr Type member{};  

4. Use Concepts for Compile-Time Validation

// Not: Runtime checks
void process(void* data) {
    if (!is_valid(data)) throw std::runtime_error("Invalid");
}

// Instead: Compile-time constraints
template<binary_readable T>
void process(const T& data) { /* ... */ }

5. Mark Functions `noexcept`

constexpr Type get(this auto&& self) noexcept {  // noexcept enables optimizations
    return std::forward<decltype(self)>(self).value;
}

Verification Tools

Compiler Explorer

Use Compiler Explorer to verify zero overhead:

#include <cstdint>

namespace stx {
    template<typename T, typename Tag>
    class strong_type { T value; public: constexpr auto get() const { return value; } };
    
    struct tag{};
    using offset_t = strong_type<size_t, tag>;
}

size_t test(stx::offset_t off) {
    return off.get() + 10;
}

With -O2, this produces minimal assembly with no wrapper overhead.

Static Assertions

STX includes extensive compile-time checks:

static_assert(sizeof(offset_t) == sizeof(usize));
static_assert(std::is_trivially_copyable_v<offset_t>);
static_assert(std::is_standard_layout_v<offset_t>);
static_assert(noexcept(offset_t{}.get()));

Summary

STX achieves zero overhead through:

Trivial types with no vtables or padding
Constexpr for compile-time evaluation
Concepts for compile-time constraints
Explicit object parameters for optimal forwarding
Header-only design enabling aggressive inlining
No dynamic allocation in core abstractions
No virtual functions or RTTI

The result: type safety and expressiveness at literally zero runtime cost.

All STX abstractions compile down to the same machine code you would write by hand - often better, thanks to compiler optimizations.

Type System

Explore STX’s fundamental type aliases

Strong Types

Learn about type-safe wrappers for addresses

Get Started

Core Concepts

Integration

Examples

​Zero Overhead Design

​The Zero-Overhead Principle

No Unused Features

Maximum Efficiency

​Memory Layout

​Strong Types Have No Overhead

​Trivial Type Guarantees

​Constexpr: Compile-Time Execution

​Constexpr Strong Types

​Constexpr Address Normalization

​Compile-Time Validation

​Assembly Output Comparison

​Code

​Generated Assembly (x86-64, -O2)

​Inlining and Optimization

​Aggressive Inlining

​Dead Code Elimination

​Concept-Based Constraints

​Example: Constrained Function

​Explicit Object Parameters (C++23)

​Real-World Example: Binary Parsing

​Benchmarks

​Design Guidelines for Zero Overhead

​1. No Virtual Functions

​2. No Dynamic Allocation

​3. Prefer Constexpr

​4. Use Concepts for Compile-Time Validation

​5. Mark Functions noexcept

​Verification Tools

​Compiler Explorer

​Static Assertions

​Summary

​See Also

Type System

Strong Types

Build docs developers (and LLMs) love

Zero Overhead Design

The Zero-Overhead Principle

Memory Layout

Strong Types Have No Overhead

Trivial Type Guarantees

Constexpr: Compile-Time Execution

Constexpr Strong Types

Constexpr Address Normalization

Compile-Time Validation

Assembly Output Comparison

Code

Generated Assembly (x86-64, -O2)

Inlining and Optimization

Aggressive Inlining

Dead Code Elimination

Concept-Based Constraints

Example: Constrained Function

Explicit Object Parameters (C++23)

Real-World Example: Binary Parsing

Benchmarks

Design Guidelines for Zero Overhead

1. No Virtual Functions

2. No Dynamic Allocation

3. Prefer Constexpr

4. Use Concepts for Compile-Time Validation

5. Mark Functions `noexcept`

Verification Tools

Compiler Explorer

Static Assertions

Summary

See Also