GPU Architecture

Overview

Oboromi emulates the Nintendo Switch 2’s GPU using a translation-based approach. The system decodes NVIDIA SM86 (Ampere architecture) shader instructions and translates them to SPIR-V for execution on Vulkan-compatible GPUs.

SM86 is NVIDIA’s shader architecture for the GA10x Ampere series. The Switch 2 likely uses a custom Tegra GPU based on this architecture.

Architecture Pipeline

GPU State Management

State Structure

The State struct manages GPU resources and Vulkan context:

// core/src/gpu/mod.rs:36
pub struct State {
    pub shared_memory: *mut u8,
    pub global_memory: *mut u8,
    pub pc: u64,
    pub vk: VkState,
}

Vulkan Integration

// core/src/gpu/mod.rs:7
pub struct VkState {
    pub entry: ash::Entry,
    pub instance: ash::Instance,
}

impl VkState {
    pub fn init(&mut self) -> ash::prelude::VkResult<()> {
        self.entry = unsafe { ash::Entry::load().unwrap() };
        self.instance = unsafe {
            self.entry.create_instance(&vk::InstanceCreateInfo {
                p_application_info: &vk::ApplicationInfo {
                    api_version: vk::make_api_version(0, 1, 0, 0),
                    ..Default::default()
                },
                ..Default::default()
            }, None)?
        };
        Ok(())
    }
}

SM86 Instruction Decoder

Decoder Architecture

The SM86 decoder maintains a virtual register file and emits SPIR-V instructions:

// core/src/gpu/sm86.rs:185
pub struct Decoder<'a> {
    pub ir: &'a mut spirv::Emitter,
    type_void: u32,
    type_ptr_u32: u32,
    
    // Type declarations for various bit widths and vector sizes
    type_u8: [u32; 5],
    type_u16: [u32; 5],
    type_u32: [u32; 5],
    type_u64: [u32; 5],
    type_s8: [u32; 5],
    type_s16: [u32; 5],
    type_s32: [u32; 5],
    type_s64: [u32; 5],
    type_f16: [u32; 5],
    type_f32: [u32; 5],
    type_f64: [u32; 5],
    type_bool: [u32; 5],
    
    // Abstract state machine
    regs: [u32; MAX_REG_COUNT],
}

Decoder Initialization

// core/src/gpu/sm86.rs:207
impl<'a> Decoder<'a> {
    pub fn init(&mut self) {
        self.type_void = self.ir.emit_type_void();
        
        // Declare scalar types
        self.type_u8[1] = self.ir.emit_type_int(8, 0);
        self.type_u16[1] = self.ir.emit_type_int(16, 0);
        self.type_u32[1] = self.ir.emit_type_int(32, 0);
        self.type_u64[1] = self.ir.emit_type_int(64, 0);
        self.type_f32[1] = self.ir.emit_type_float(32);
        // ... (additional type setup)

        // Define generic pointers (storage class 7 = Function)
        self.type_ptr_u32 = self.ir.emit_type_pointer(7, self.type_u32[1]);

        // Define registers as function-scope variables
        for r in self.regs.iter_mut() {
            *r = self.ir.emit_variable(self.type_ptr_u32, 7);
        }
    }
}

Register File

SM86 supports 254 general-purpose registers (R0-R253) plus a special RZ register (R255) that always reads as zero and discards writes.

// core/src/gpu/sm86.rs:241
fn load_reg(&mut self, reg: usize) -> u32 {
    if reg == 255 {
        // RZ (Zero Register)
        return self.ir.emit_constant_typed(self.type_u32[1], 0u32);
    }
    assert!(reg < self.regs.len(), "Register index out of bounds");
    let ptr = self.regs[reg];
    self.ir.emit_load(self.type_u32[1], ptr)
}

fn store_reg(&mut self, reg: usize, val: u32) {
    if reg == 255 {
        // Write to RZ is ignored
        return;
    }
    let ptr = self.regs[reg];
    self.ir.emit_store(ptr, val);
}

Instruction Formats

Instruction Encoding

SM86 instructions are 128-bit (16 bytes) encoded values:

// Instructions take a u128 parameter representing the binary encoding
pub fn al2p(&mut self, inst: u128) {
    let pg = (((inst >> 12) & 0x7) << 0);           // Predicate guard
    let rd = (((inst >> 16) & 0xff) << 0) as usize; // Destination register
    let ra = (((inst >> 24) & 0xff) << 0) as usize; // Source register A
    let ra_offset = (((inst >> 40) & 0x7ff) << 0);  // Immediate offset
    // ... decode additional fields
}

Example: AL2P (Add to Pointer)

The AL2P instruction adds an immediate offset to a register:

// core/src/gpu/sm86.rs:265
pub fn al2p(&mut self, inst: u128) {
    let rd = (((inst >> 16) & 0xff) << 0) as usize;
    let ra = (((inst >> 24) & 0xff) << 0) as usize;
    let ra_offset = (((inst >> 40) & 0x7ff) << 0) as usize;
    let bop = (((inst >> 74) & 0x3) << 0) as usize;
    
    assert!(ra <= MAX_REG_COUNT || ra == 255);
    assert!(bop == BitSize::B32 as usize);
    
    let base = self.load_reg(ra);
    let offset = self.ir.emit_constant_typed(self.type_u32[1], ra_offset as u32);
    let dst_val = self.ir.emit_iadd(self.type_u32[1], base, offset);
    self.store_reg(rd, dst_val);
}

Supported Instructions

The decoder defines 254+ SM86 instructions including:

Memory Instructions

ALD - Attribute Load
AST - Attribute Store
ATOM - Atomic Operation
ATOMG - Global Atomic
ATOMS - Shared Atomic

Arithmetic Instructions

FADD, FADD32I - Floating-point addition
FMUL, FMUL32I - Floating-point multiplication
FFMA, FFMA32I - Fused multiply-add
DADD, DMUL, DFMA - Double-precision operations

Control Flow

BRA - Branch
BRX - Branch indexed
CALL - Function call
EXIT - Shader exit
BREAK - Loop break

Conversion

F2F - Float-to-float conversion
F2I, F2IP - Float-to-integer
I2F, I2FP - Integer-to-float

Texture Operations

TEX - Texture fetch
TLD, TLD4 - Texture load
SUTP - Surface store

Most instruction implementations currently contain todo!() placeholders. The full instruction set is being implemented incrementally.

SPIR-V Translation

SPIR-V Emitter

The spirv::Emitter provides a safe Rust API for building SPIR-V modules:

// core/src/gpu/spirv.rs:172
pub struct Emitter {
    words: Vec<u32>,
    next_id: u32,
    bound_idx: usize,
}

Module Structure

A complete SPIR-V shader follows this structure:

Header - Magic number, version, ID bound
Capabilities - Required SPIR-V features
Extensions - Optional extensions
Memory Model - Addressing and memory semantics
Entry Points - Shader entry functions
Execution Modes - Workgroup size, etc.
Debug Info - Names and source locations
Annotations - Decorations (bindings, locations)
Type Declarations - All types used in shader
Constants - Constant values
Global Variables - Uniforms, inputs, outputs
Function Definitions - Shader code

Example: Building a Function

let mut emitter = spirv::Emitter::new();
emitter.emit_header();
emitter.emit_capability(spirv::capability::SHADER);
emitter.emit_memory_model(0, 1); // Logical, GLSL450

// Define types
let void_ty = emitter.emit_type_void();
let fn_ty = emitter.emit_type_function(void_ty, &[]);

// Create entry point function
let main_fn = emitter.emit_function(void_ty, 0, fn_ty);
let entry_label = emitter.emit_label();

// ... shader logic ...

emitter.emit_return();
emitter.emit_function_end();

emitter.finalize();

Supported SPIR-V Operations

The emitter supports 100+ SPIR-V instructions:

Arithmetic: iadd, fadd, imul, fmul, fdiv, etc.
Logical: logical_and, logical_or, select
Comparison: iequal, ford_less_than, sgreater_than
Bitwise: shift_left_logical, bitwise_and, bit_reverse
Memory: load, store, access_chain
Control: branch, branch_conditional, phi
Image: image_sample, image_read, image_write
Atomic: atomic_iadd, atomic_exchange, atomic_compare_exchange

Texture and Surface Formats

The GPU module defines extensive format enumerations:

Surface Formats

// core/src/gpu/sm86.rs:56
enum SurfaceFormat {
    RGBA32_FLOAT = 0x00c0,
    RGBA32_SINT = 0x00c1,
    RGBA16_UNORM = 0x00c6,
    RGBA8_UNORM = 0x00d5,
    BGRA8_UNORM = 0x00cf,
    R32_FLOAT = 0x00e5,
    // ... 50+ formats
}

Image Formats (for SUTP)

// core/src/gpu/sm86.rs:135
enum ImageFormat {
    RGBA32_FLOAT = 0x02,
    RGBA16_FLOAT = 0x0c,
    RGBA8_UNORM = 0x18,
    RG32_FLOAT = 0x0d,
    R32_FLOAT = 0x29,
    // ... specialized formats
}

Shader Constants

Hardware limits defined as constants:

// core/src/gpu/sm86.rs:7
static MAX_REG_COUNT: usize = 254;
static MAX_UNIFORM_REG_COUNT: usize = 63;
static MAX_CONST_BANK: usize = 17;
static ALLOW_F16_PARTIAL_WRITES: usize = 1;

Vulkan Backend

The Vulkan backend (via ash crate) handles:

Instance Creation: Vulkan 1.0+ initialization
Device Selection: Picking suitable GPU
Pipeline Creation: Compiling SPIR-V shaders
Command Submission: Recording and executing GPU work

Future Pipeline

SM86 Shader → Decoder → SPIR-V IR → Vulkan Pipeline → GPU Execution
                                   ↓
                          Shader Specialization
                          (constant folding, etc.)

Performance Optimizations

Planned Optimizations

Shader Caching: Cache translated SPIR-V to avoid re-translation
Specialization Constants: Use SPIR-V spec constants for dynamic values
Dead Code Elimination: Remove unused registers and instructions
Register Allocation: Optimize SPIR-V register usage
Instruction Combining: Merge common patterns (e.g., MAD → FMA)

Debugging Support

SPIR-V Validation

// core/src/gpu/spirv.rs:1089
pub fn validate(&self) -> Result<(), &'static str> {
    if self.words.len() < 5 {
        return Err("Module too short for valid header");
    }
    if self.words[0] != 0x07230203 {
        return Err("Invalid SPIR-V magic number");
    }
    // Walk instructions and verify structure
    // ...
}

Binary Export

// core/src/gpu/spirv.rs:1122
pub fn to_bytes(&self) -> Vec<u8> {
    let mut out = Vec::with_capacity(self.words.len() * 4);
    for &w in &self.words {
        out.extend_from_slice(&w.to_le_bytes());
    }
    out
}

Exported SPIR-V can be validated with spirv-val and disassembled with spirv-dis.

Testing

GPU tests verify decoder and emitter functionality:

// core/src/gpu/test.rs
#[test]
fn test_sm86_decoder() {
    let mut ir = spirv::Emitter::new();
    let mut decoder = sm86::Decoder { ir: &mut ir, /* ... */ };
    decoder.init();
    
    // Test instruction decoding
    let inst: u128 = /* ... encoded instruction ... */;
    decoder.al2p(inst);
    
    ir.finalize();
    assert!(ir.validate().is_ok());
}

Future Enhancements

Complete Instruction Set: Implement all 254+ SM86 instructions
Geometry Shaders: Support for geometry and tessellation stages
Compute Shaders: Full compute pipeline with shared memory
Ray Tracing: RTX operations if Switch 2 supports RT cores
Performance Counters: GPU profiling and metrics

Architecture Overview - System architecture
Memory Architecture - GPU memory management

Get Started

Architecture

Core Components

Development

​Overview

​Architecture Pipeline

​GPU State Management

​State Structure

​Vulkan Integration

​SM86 Instruction Decoder

​Decoder Architecture

​Decoder Initialization

​Register File

​Instruction Formats

​Instruction Encoding

​Example: AL2P (Add to Pointer)

​Supported Instructions

​Memory Instructions

​Arithmetic Instructions

​Control Flow

​Conversion

​Texture Operations

​SPIR-V Translation

​SPIR-V Emitter

​Module Structure

​Example: Building a Function

​Supported SPIR-V Operations

​Texture and Surface Formats

​Surface Formats

​Image Formats (for SUTP)

​Shader Constants

​Vulkan Backend

​Future Pipeline

​Performance Optimizations

​Planned Optimizations

​Debugging Support

​SPIR-V Validation

​Binary Export

​Testing

​Future Enhancements

​Related Documentation

Build docs developers (and LLMs) love

Overview

Architecture Pipeline

GPU State Management

State Structure

Vulkan Integration

SM86 Instruction Decoder

Decoder Architecture

Decoder Initialization

Register File

Instruction Formats

Instruction Encoding

Example: AL2P (Add to Pointer)

Supported Instructions

Memory Instructions

Arithmetic Instructions

Control Flow

Conversion

Texture Operations

SPIR-V Translation

SPIR-V Emitter

Module Structure

Example: Building a Function

Supported SPIR-V Operations

Texture and Surface Formats

Surface Formats

Image Formats (for SUTP)

Shader Constants

Vulkan Backend

Future Pipeline

Performance Optimizations

Planned Optimizations

Debugging Support

SPIR-V Validation

Binary Export

Testing

Future Enhancements

Related Documentation