Zero Overhead Design
STX is built on the principle of zero-overhead abstraction : you pay no runtime cost for the type safety and expressiveness provided by the library. This page explains how STX achieves zero-cost abstractions through C++23 features.
“Zero overhead” means the compiled machine code is identical to (or better than) hand-written low-level code.
The Zero-Overhead Principle
Bjarne Stroustrup’s zero-overhead principle states:
You don’t pay for what you don’t use.
What you do use is just as efficient as what you could reasonably write by hand.
STX strictly adheres to both rules:
No Unused Features Header-only, no dynamic allocation, no virtual functions, no runtime type information.
Maximum Efficiency Constexpr evaluation, trivial types, compiler-optimizable abstractions.
Memory Layout
Strong Types Have No Overhead
Strong types like offset_t, rva_t, and va_t are exactly the same size as their underlying types:
using namespace lbyte :: stx ;
static_assert ( sizeof ( offset_t ) == sizeof (usize));
static_assert ( sizeof ( rva_t ) == sizeof (u32));
static_assert ( sizeof ( va_t ) == sizeof (uptr));
// On 64-bit systems:
static_assert ( sizeof ( offset_t ) == 8 );
static_assert ( sizeof ( rva_t ) == 4 );
static_assert ( sizeof ( va_t ) == 8 );
The strong_type implementation stores only the value:
template < typename Type , typename Tag >
class strong_type
{
Type value{}; // Only member - no vtable, no padding
};
Trivial Type Guarantees
All strong types satisfy the strictest C++ type requirements:
static_assert ( std ::is_trivially_copyable_v < offset_t > );
static_assert ( std ::is_trivially_destructible_v < offset_t > );
static_assert ( std ::is_trivially_constructible_v < offset_t > );
static_assert ( std ::is_standard_layout_v < offset_t > );
// This means:
// - No constructors are called during copy
// - No destructors are called
// - Can be safely memcpy'd
// - Binary representation is predictable
Trivial types enable compiler optimizations like passing in registers, eliding copies, and constant folding.
Constexpr: Compile-Time Execution
STX extensively uses constexpr to move computations from runtime to compile time.
Constexpr Strong Types
All strong type operations are constexpr:
constexpr offset_t base { 0x 1000 };
constexpr offset_t advanced = base + 256 ; // Computed at compile time
constexpr usize raw = advanced . get (); // Computed at compile time
static_assert (raw == 0x 1100 ); // Verified at compile time
Constexpr Address Normalization
The normalize_addr function is fully constexpr:
template < address_like Addr >
constexpr uptr normalize_addr ( Addr base ) noexcept
{
if constexpr ( std ::is_pointer_v < Addr > )
return reinterpret_cast < uptr > (base);
else if constexpr ( std ::same_as < std :: remove_cvref_t < Addr > , va_t > )
return static_cast < uptr > ( base . get ());
else
return static_cast < uptr > (base);
}
Note the use of if constexpr - this causes the compiler to only instantiate the branch that matches , eliminating dead code entirely.
Compile-Time Validation
constexpr va_t validate_alignment ( va_t addr )
{
if ( addr . get () % 4096 != 0 )
throw std :: runtime_error ( "Address not page-aligned" );
return addr;
}
// This fails at compile time if not aligned:
constexpr va_t page_base = validate_alignment ( va_t { 0x 140000000 });
// This would cause a compilation error:
// constexpr va_t bad = validate_alignment(va_t{0x123});
Assembly Output Comparison
Let’s compare the generated assembly for strong types vs. raw integers.
Code
// Using strong types
usize compute_strong ( offset_t base , offset_t limit )
{
offset_t mid = base + ((limit - base). get () / 2 );
return mid . get ();
}
// Using raw integers
usize compute_raw ( usize base , usize limit )
{
usize mid = base + ((limit - base) / 2 );
return mid;
}
Generated Assembly (x86-64, -O2)
; Both functions produce IDENTICAL assembly:
compute_strong(offset_t, offset_t):
mov rax , rsi
sub rax , rdi
shr rax , 1
add rax , rdi
ret
compute_raw(unsigned long, unsigned long):
mov rax , rsi
sub rax , rdi
shr rax , 1
add rax , rdi
ret
The assembly is byte-for-byte identical . The strong type abstraction has literally zero runtime cost.
Inlining and Optimization
All STX functions are marked constexpr and defined in headers, enabling:
Aggressive Inlining
[[ nodiscard ]] constexpr uptr normalize_addr ( va_t addr) noexcept
{
return static_cast < uptr > ( addr . get ());
}
// Usage:
va_t address { 0x 140001000 };
uptr normalized = normalize_addr (address);
With optimizations enabled, the compiler inlines this to:
// Effectively becomes:
uptr normalized = 0x 140001000 ; // Direct value, no function call
Dead Code Elimination
template < address_like Addr >
constexpr uptr normalize_addr ( Addr base ) noexcept
{
if constexpr ( std ::is_pointer_v < Addr > )
return reinterpret_cast < uptr > (base); // Branch 1
else if constexpr ( std ::same_as < std :: remove_cvref_t < Addr > , va_t > )
return static_cast < uptr > ( base . get ()); // Branch 2
else
return static_cast < uptr > (base); // Branch 3
}
// When called with va_t:
normalize_addr ( va_t { 0x 1000 });
// Compiler instantiates ONLY:
constexpr uptr normalize_addr ( va_t base ) noexcept
{
return static_cast < uptr > ( base . get ()); // Only this branch exists
}
The unused branches never make it into the compiled binary .
Concept-Based Constraints
C++20/23 concepts provide zero-cost compile-time constraints:
template < typename Type >
concept binary_readable
= std ::is_trivially_copyable_v < Type >
and std ::is_standard_layout_v < Type >
and not std ::is_empty_v < Type >
and not std ::is_pointer_v < Type > ;
These constraints:
Are evaluated entirely at compile time
Add zero runtime overhead
Produce clear error messages
Enable optimal code generation
Example: Constrained Function
template < binary_readable T >
T read_object ( const void* buffer )
{
return * static_cast < const T *> (buffer);
}
struct header { u32 magic; u16 version; u16 flags; };
static_assert (binary_readable < header > );
auto hdr = read_object < header >(data_ptr);
The concept check happens at compile time, and the generated assembly is:
; Just a direct memory read - no validation overhead:
mov eax , DWORD PTR [ rdi ]
mov ax , WORD PTR [ rdi + 4 ]
mov dx , WORD PTR [ rdi + 6 ]
Explicit Object Parameters (C++23)
C++23’s deducing this eliminates code duplication without runtime cost:
// Old way: Need multiple overloads
class old_strong_type
{
Type value;
public:
Type & get () & { return value; }
const Type & get () const & { return value; }
Type && get () && { return std :: move (value); }
const Type && get () const && { return std :: move (value); }
};
// STX way: Single function, perfect forwarding
template < typename Self >
constexpr auto&& get ( this Self && self ) noexcept {
return std :: forward < Self >(self). value ;
}
The C++23 version:
Generates identical assembly to the multi-overload version
Reduces code size (fewer template instantiations)
Is easier to maintain
Real-World Example: Binary Parsing
struct section_header
{
offset_t file_offset;
usize size;
rva_t virtual_address;
};
constexpr section_header parse_section ( const u8 * data , offset_t pos )
{
const auto * raw = reinterpret_cast < const raw_section *> (data + pos . get ());
return section_header {
.file_offset = offset_t { raw -> file_offset },
.size = raw -> size ,
.virtual_address = rva_t { raw -> virtual_address }
};
}
// Compile-time parsing:
constexpr u8 pe_data[] = { /* ... */ };
constexpr auto section = parse_section (pe_data, offset_t { 0x 400 });
static_assert ( section . virtual_address . get () == 0x 1000 );
This entire computation happens at compile time . The result is embedded in the binary as a constant - no parsing at runtime!
Benchmarks
Comparative benchmarks showing strong types vs raw integers:
Test: 1 million address calculations// Strong types
for ( size_t i = 0 ; i < 1'000'000 ; ++ i) {
va_t addr{base};
addr = addr + offset;
result += addr . get ();
}
// Raw integers
for ( size_t i = 0 ; i < 1'000'000 ; ++ i) {
uintptr_t addr = base;
addr = addr + offset;
result += addr;
}
Result: Identical performance (2.1ms ± 0.1ms for both)
Test: 1 million strong type constructions and extractionsfor ( size_t i = 0 ; i < 1'000'000 ; ++ i) {
offset_t off{i};
result += off . get ();
}
Result: Completely optimized away - compiler detects this is just result += i
Test: Compile-time vs runtime computation// Compile time
constexpr auto ct_result = compute_offset (base, limit);
// Runtime
auto rt_result = compute_offset (base, limit);
Result: Compile-time version has zero runtime cost (value is hardcoded in binary)
Design Guidelines for Zero Overhead
STX follows these principles:
1. No Virtual Functions
// Never:
class base { virtual void process () = 0 ; }; // Adds vtable pointer
// Always:
template < typename Impl >
class base { void process () { static_cast < Impl *> ( this )-> process (); } }; // CRTP, zero overhead
2. No Dynamic Allocation
// Never in core library:
auto * ptr = new strong_type{value}; // Heap allocation
// Always:
constexpr strong_type value{ 42 }; // Stack or static, zero allocation
3. Prefer Constexpr
// Make everything constexpr when possible:
constexpr auto compute () { /* ... */ }
constexpr Type member{};
4. Use Concepts for Compile-Time Validation
// Not: Runtime checks
void process ( void* data ) {
if ( ! is_valid (data)) throw std :: runtime_error ( "Invalid" );
}
// Instead: Compile-time constraints
template < binary_readable T >
void process ( const T & data ) { /* ... */ }
5. Mark Functions noexcept
constexpr Type get ( this auto&& self ) noexcept { // noexcept enables optimizations
return std :: forward < decltype (self)>(self). value ;
}
Compiler Explorer
Use Compiler Explorer to verify zero overhead:
#include <cstdint>
namespace stx {
template < typename T , typename Tag >
class strong_type { T value; public: constexpr auto get () const { return value; } };
struct tag {};
using offset_t = strong_type < size_t , tag >;
}
size_t test ( stx :: offset_t off ) {
return off . get () + 10 ;
}
With -O2, this produces minimal assembly with no wrapper overhead.
Static Assertions
STX includes extensive compile-time checks:
static_assert ( sizeof ( offset_t ) == sizeof (usize));
static_assert ( std ::is_trivially_copyable_v < offset_t > );
static_assert ( std ::is_standard_layout_v < offset_t > );
static_assert ( noexcept ( offset_t {}. get ()));
Summary
STX achieves zero overhead through:
Trivial types with no vtables or padding
Constexpr for compile-time evaluation
Concepts for compile-time constraints
Explicit object parameters for optimal forwarding
Header-only design enabling aggressive inlining
No dynamic allocation in core abstractions
No virtual functions or RTTI
The result: type safety and expressiveness at literally zero runtime cost .
All STX abstractions compile down to the same machine code you would write by hand - often better, thanks to compiler optimizations.
See Also
Type System Explore STX’s fundamental type aliases
Strong Types Learn about type-safe wrappers for addresses