PowerPC Instruction Translation

Overview

ReXGlue translates PowerPC instructions to native C++ code that executes on x86-64 and ARM64 architectures. This page explains the instruction translation strategy, register mapping, and execution model.

PPCContext Structure

The PPCContext (source:include/rex/ppc/context.h:170) represents the complete PowerPC processor state:

struct alignas(0x40) PPCContext {
  // Kernel state pointer
  rex::system::KernelState* kernel_state;
  
  // General Purpose Registers (GPRs)
  PPCRegister r0, r1, r2, r3, ..., r31;
  
  // Link Register and Count Register
  uint64_t lr;
  PPCRegister ctr;
  
  // Fixed-Point Exception Register
  PPCXERRegister xer;  // {so, ov, ca}
  
  // Condition Register fields
  PPCCRRegister cr0, cr1, ..., cr7;  // {lt, gt, eq, so/un}
  
  // Floating-Point Status and Control Register
  PPCFPSCRRegister fpscr;
  
  // Floating-Point Registers (FPRs)
  PPCRegister f0, f1, ..., f31;  // 64-bit doubles
  
  // Vector Registers (VMX/AltiVec)
  PPCVRegister v0, v1, ..., v127;  // 128-bit vectors
  
  // Vector Status and Control Register
  uint8_t vscr_sat;  // Saturation flag
};

The context is aligned to 64 bytes (alignas(0x40)) for optimal cache performance. It’s passed by reference to every recompiled function.

Register Types

ReXGlue defines several register types (source:include/rex/ppc/types.h):

General Purpose Register

union Register {
  int8_t s8;
  uint8_t u8;
  int16_t s16;
  uint16_t u16;
  int32_t s32;
  uint32_t u32;
  int64_t s64;
  uint64_t u64;
  float f32;
  double f64;
};

PowerPC GPRs are 64-bit, but most instructions only use the lower 32 bits. The union allows type-punning for efficient access.

XER Register (Fixed-Point Exception)

struct XERRegister {
  uint8_t so;  // Summary Overflow
  uint8_t ov;  // Overflow
  uint8_t ca;  // Carry
};

Used by arithmetic instructions (addc, subfe, etc.).

Condition Register Field

struct CRRegister {
  uint8_t lt;  // Less Than
  uint8_t gt;  // Greater Than
  uint8_t eq;  // Equal
  union {
    uint8_t so;  // Summary Overflow (integer)
    uint8_t un;  // Unordered (float - NaN)
  };
  
  template <typename T>
  inline void compare(T left, T right, const XERRegister& xer) {
    lt = left < right;
    gt = left > right;
    eq = left == right;
    so = xer.so;
  }
};

Vector Register

union alignas(0x10) VRegister {
  int8_t s8[16];
  uint8_t u8[16];
  int16_t s16[8];
  uint16_t u16[8];
  int32_t s32[4];
  uint32_t u32[4];
  int64_t s64[2];
  uint64_t u64[2];
  float f32[4];
  double f64[2];
};

Memory Access

PowerPC uses big-endian byte order, while x86-64 and most ARM64 systems use little-endian. ReXGlue handles byte swapping transparently.

Load Macros

// Load with byte swap (source:include/rex/ppc/memory.h:54)
#define PPC_LOAD_U8(x)  (*(volatile uint8_t*)(base + (uint32_t)(x) + PPC_PHYS_HOST_OFFSET(x)))
#define PPC_LOAD_U16(x) __builtin_bswap16(*(volatile uint16_t*)(base + (uint32_t)(x) + ...))
#define PPC_LOAD_U32(x) __builtin_bswap32(*(volatile uint32_t*)(base + (uint32_t)(x) + ...))
#define PPC_LOAD_U64(x) __builtin_bswap64(*(volatile uint64_t*)(base + (uint32_t)(x) + ...))

Store Macros

#define PPC_STORE_U8(x, y)  (*(volatile uint8_t*)(base + (uint32_t)(x) + ...) = (y))
#define PPC_STORE_U16(x, y) (*(volatile uint16_t*)(base + (uint32_t)(x) + ...) = __builtin_bswap16(y))
#define PPC_STORE_U32(x, y) (*(volatile uint32_t*)(base + (uint32_t)(x) + ...) = __builtin_bswap32(y))
#define PPC_STORE_U64(x, y) (*(volatile uint64_t*)(base + (uint32_t)(x) + ...) = __builtin_bswap64(y))

Physical Heap Offset WorkaroundOn Windows, the allocation granularity is 64KB, so the 0x1000-byte file offset for the 0xE0000000 physical heap gets masked away. PPC_PHYS_HOST_OFFSET() compensates by adding 0x1000 to addresses ≥ 0xE0000000 (source:include/rex/ppc/memory.h:42).

MMIO (Memory-Mapped I/O)

Addresses in the range 0x7F000000 - 0x7FFFFFFF are MMIO (GPU registers, audio, etc.). These go through the MMIOHandler:

#define PPC_MM_LOAD_U32(addr) \
  (PPC_IS_MMIO_ADDR(addr) \
    ? ({ uint32_t _v; \
         rex::runtime::MMIOHandler::global_handler()->CheckLoad(addr, &_v); \
         _v; }) \
    : __builtin_bswap32(*(volatile uint32_t*)(base + (addr) + ...)))

The recompiler uses MMIO macros when it detects MMIO base addresses in registers.

Instruction Categories

Integer Arithmetic

Addition:

// add r3, r4, r5
r3.u32 = r4.u32 + r5.u32;

// addi r3, r4, 0x10
r3.u32 = r4.u32 + 0x10;

// addic r3, r4, 0x10  (sets CA)
ctx.xer.ca = /* carry detection logic */;
r3.u32 = r4.u32 + 0x10;

Comparison:

// cmpw cr0, r3, r4
cr0.compare(r3.s32, r4.s32, ctx.xer);

// cmpwi cr0, r3, 0
cr0.compare(r3.s32, 0, ctx.xer);

Floating-Point

Arithmetic:

// fadd f1, f2, f3
f1.f64 = f2.f64 + f3.f64;

// fmul f1, f2, f3
f1.f64 = f2.f64 * f3.f64;

// fmadd f1, f2, f3, f4  (f1 = f2*f3 + f4)
f1.f64 = std::fma(f2.f64, f3.f64, f4.f64);

Rounding Mode: The FPSCRRegister manages x86-64 MXCSR or ARM64 FPCR rounding modes (source:include/rex/ppc/types.h:468):

struct FPSCRRegister {
  uint32_t csr;  // Host control/status register
  
  static constexpr size_t HostToGuest[] = {
    kRoundNearest,      // 0 -> 0
    kRoundDown,         // 1 -> 3
    kRoundUp,           // 2 -> 2
    kRoundTowardZero    // 3 -> 1
  };
  
  void storeFromGuest(uint32_t value) {
    csr &= ~RoundMaskVal;
    csr |= Platform::GuestToHost[value & kRoundMask];
    setcsr(csr);
  }
};

Vector/SIMD (AltiVec/VMX)

ReXGlue uses SIMDe for cross-platform SIMD:

#include <simde/x86/sse4.1.h>

// vadduwm v0, v1, v2  (add 4x uint32)
v0.v128 = simde_mm_add_epi32(v1.v128, v2.v128);

// vmaxsw v0, v1, v2  (max 4x int32)
v0.v128 = simde_mm_max_epi32(v1.v128, v2.v128);

// lvx v0, 0, r3  (load vector from memory)
v0.v128 = simde_mm_loadu_si128((simde__m128i*)(base + r3.u32));

Custom Vector Helpers: Some AltiVec instructions require custom implementations (source:include/rex/ppc/memory.h:343):

// Vector Convert To Unsigned Fixed-Point Word Saturate
inline simde__m128i simde_mm_vctuxs(simde__m128 src1) {
  // Clamp to [0, UINT_MAX]
  simde__m128 clamped = simde_mm_max_ps(src1, simde_mm_setzero_ps());
  clamped = simde_mm_min_ps(clamped, simde_mm_set1_ps(4294967295.0f));
  // Convert with saturation logic...
}

Branches and Calls

Unconditional:

// b 0x82E00100
PPC_CALL_FUNC(function_82E00100);

// bl 0x82E00100  (branch and link)
ctx.lr = /* return address */;
PPC_CALL_FUNC(function_82E00100);

Conditional:

// beq cr0, 0x82E00100
if (cr0.eq) {
  PPC_CALL_FUNC(function_82E00100);
}

// bne cr0, 0x82E00100
if (!cr0.eq) {
  PPC_CALL_FUNC(function_82E00100);
}

Indirect:

// bctr  (branch to count register)
PPC_CALL_INDIRECT_FUNC(ctx.ctr.u32);

// bctrl  (branch to count register and link)
ctx.lr = /* return address */;
PPC_CALL_INDIRECT_FUNC(ctx.ctr.u32);

Load/Store

Byte-swapping loads:

// lwz r3, 0x10(r4)  (load word and zero)
r3.u32 = PPC_LOAD_U32(r4.u32 + 0x10);

// lhz r3, 0x10(r4)  (load halfword and zero)
r3.u32 = PPC_LOAD_U16(r4.u32 + 0x10);

// lbz r3, 0x10(r4)  (load byte and zero)
r3.u32 = PPC_LOAD_U8(r4.u32 + 0x10);

Sign-extending loads:

// lha r3, 0x10(r4)  (load halfword algebraic)
r3.s32 = (int32_t)(int16_t)PPC_LOAD_U16(r4.u32 + 0x10);

Stores:

// stw r3, 0x10(r4)
PPC_STORE_U32(r4.u32 + 0x10, r3.u32);

// sth r3, 0x10(r4)
PPC_STORE_U16(r4.u32 + 0x10, r3.u16);

Special Instructions

Timebase

// mftb r3  (move from time base)
r3.u64 = PPC_QUERY_TIMEBASE();

#define PPC_QUERY_TIMEBASE() rex::chrono::Clock::QueryGuestTickCount()

Synchronization

// lwarx r3, 0, r4  (load word and reserve)
ctx.reserved.u32 = r4.u32;  // Save reservation address
r3.u32 = PPC_LOAD_U32(r4.u32);

// stwcx. r3, 0, r4  (store word conditional)
if (ctx.reserved.u32 == r4.u32) {
  PPC_STORE_U32(r4.u32, r3.u32);
  cr0.eq = 1;  // Success
} else {
  cr0.eq = 0;  // Failure
}
ctx.reserved.u32 = 0;  // Clear reservation

Traps

Trap instructions generate exceptions (source:include/rex/ppc/context.h:502):

// twi 31, r0, 20  (unconditional trap - debug print)
inline void ppc_trap(PPCContext& ctx, uint8_t* base, uint16_t trap_type) {
  switch (trap_type) {
    case 20:
    case 26: {  // Debug print
      auto str = PPC_LOAD_STRING(ctx.r3.u32, ctx.r4.u16);
      REXCPU_DEBUG("(service trap) {}", str);
      break;
    }
    case 0:
    case 22:  // Debug break
      REXCPU_WARN("tw/td trap hit (type {})", trap_type);
      break;
  }
}

Interrupt Handling

ReXGlue emulates PowerPC interrupt disable/enable via a global lock (source:include/rex/ppc/context.h:464):

// mfmsr r3  (move from machine state register)
r3.u64 = PPC_CHECK_GLOBAL_LOCK();  // Returns 0x8000 if unlocked

// mtmsr r13  (move to MSR from r13 - disable interrupts)
PPC_ENTER_GLOBAL_LOCK();

// mtmsr r3  (move to MSR from non-r13 - enable interrupts)
PPC_LEAVE_GLOBAL_LOCK();

The global lock uses std::recursive_mutex and atomic nesting counter.

setjmp/longjmp

PowerPC setjmp/longjmp is tricky because the guest jmp_buf format is incompatible with the host. ReXGlue uses a mapping table (source:include/rex/ppc/context.h:429):

// Thread-local map: guest_jmp_buf_addr -> host jmp_buf
inline std::unordered_map<uint32_t, jmp_buf>& get_jmp_buf_map() {
  static thread_local std::unordered_map<uint32_t, jmp_buf> map;
  return map;
}

// setjmp(guest_buf_addr)
#define ppc_setjmp(guest_buf_addr) \
  (setjmp(::rex::get_jmp_buf_map()[(guest_buf_addr)]))

// longjmp(guest_buf_addr, val)
[[noreturn]] inline void ppc_longjmp(uint32_t guest_buf_addr, int val) {
  auto& map = get_jmp_buf_map();
  auto it = map.find(guest_buf_addr);
  if (it != map.end()) {
    longjmp(it->second, val);
  }
  std::abort();  // setjmp was never called
}

ppc_setjmp must be a macro (not a function) so it captures the caller’s stack frame. Otherwise, longjmp would return to a dead frame.

Platform Differences

x86-64 (SSE/AVX)

Native support for 128-bit SIMD via SSE4.1/AVX
__rdtsc() for timebase
_mm_getcsr() / _mm_setcsr() for FPSCR

ARM64 (NEON)

SIMDe translates x86 intrinsics to NEON
mrs instruction for timebase
fpcr register for rounding mode

Performance Tips

Minimize context access: Recompiled functions use local variables when possible
MMIO detection: The recompiler tracks MMIO base addresses to avoid runtime checks
SIMD alignment: Vector loads/stores assume 16-byte alignment when safe
Inline functions: Short recompiled functions are marked inline for optimization

Get Started

CLI Reference

Configuration

Core Concepts

Application Development

Advanced

PowerPC Instruction Translation

Overview

PPCContext Structure

Register Types

General Purpose Register

XER Register (Fixed-Point Exception)

Condition Register Field

Vector Register

Memory Access

Load Macros

Store Macros

MMIO (Memory-Mapped I/O)

Instruction Categories

Integer Arithmetic

Floating-Point

Vector/SIMD (AltiVec/VMX)

Branches and Calls

Load/Store

Special Instructions

Timebase

Synchronization

Traps

Interrupt Handling

setjmp/longjmp

Platform Differences

x86-64 (SSE/AVX)

ARM64 (NEON)

Performance Tips

See Also

Build docs developers (and LLMs) love

Get Started

CLI Reference

Configuration

Core Concepts

Application Development

Advanced

​Overview

​PPCContext Structure

​Register Types

​General Purpose Register

​XER Register (Fixed-Point Exception)

​Condition Register Field

​Vector Register

​Memory Access

​Load Macros

​Store Macros

​MMIO (Memory-Mapped I/O)

​Instruction Categories

​Integer Arithmetic

​Floating-Point

​Vector/SIMD (AltiVec/VMX)

​Branches and Calls

​Load/Store

​Special Instructions

​Timebase

​Synchronization

​Traps

​Interrupt Handling

​setjmp/longjmp

​Platform Differences

​x86-64 (SSE/AVX)

​ARM64 (NEON)

​Performance Tips

​See Also

Build docs developers (and LLMs) love

Overview

PPCContext Structure

Register Types

General Purpose Register

XER Register (Fixed-Point Exception)

Condition Register Field

Vector Register

Memory Access

Load Macros

Store Macros

MMIO (Memory-Mapped I/O)

Instruction Categories

Integer Arithmetic

Floating-Point

Vector/SIMD (AltiVec/VMX)

Branches and Calls

Load/Store

Special Instructions

Timebase

Synchronization

Traps

Interrupt Handling

setjmp/longjmp

Platform Differences

x86-64 (SSE/AVX)

ARM64 (NEON)

Performance Tips

See Also