Overview
Oboromi emulates the Nintendo Switch 2’s GPU using a translation-based approach. The system decodes NVIDIA SM86 (Ampere architecture) shader instructions and translates them to SPIR-V for execution on Vulkan-compatible GPUs.
SM86 is NVIDIA’s shader architecture for the GA10x Ampere series. The Switch 2 likely uses a custom Tegra GPU based on this architecture.
Architecture Pipeline
GPU State Management
State Structure
The State struct manages GPU resources and Vulkan context:
// core/src/gpu/mod.rs:36
pub struct State {
pub shared_memory: *mut u8,
pub global_memory: *mut u8,
pub pc: u64,
pub vk: VkState,
}
Vulkan Integration
// core/src/gpu/mod.rs:7
pub struct VkState {
pub entry: ash::Entry,
pub instance: ash::Instance,
}
impl VkState {
pub fn init(&mut self) -> ash::prelude::VkResult<()> {
self.entry = unsafe { ash::Entry::load().unwrap() };
self.instance = unsafe {
self.entry.create_instance(&vk::InstanceCreateInfo {
p_application_info: &vk::ApplicationInfo {
api_version: vk::make_api_version(0, 1, 0, 0),
..Default::default()
},
..Default::default()
}, None)?
};
Ok(())
}
}
SM86 Instruction Decoder
Decoder Architecture
The SM86 decoder maintains a virtual register file and emits SPIR-V instructions:
// core/src/gpu/sm86.rs:185
pub struct Decoder<'a> {
pub ir: &'a mut spirv::Emitter,
type_void: u32,
type_ptr_u32: u32,
// Type declarations for various bit widths and vector sizes
type_u8: [u32; 5],
type_u16: [u32; 5],
type_u32: [u32; 5],
type_u64: [u32; 5],
type_s8: [u32; 5],
type_s16: [u32; 5],
type_s32: [u32; 5],
type_s64: [u32; 5],
type_f16: [u32; 5],
type_f32: [u32; 5],
type_f64: [u32; 5],
type_bool: [u32; 5],
// Abstract state machine
regs: [u32; MAX_REG_COUNT],
}
Decoder Initialization
// core/src/gpu/sm86.rs:207
impl<'a> Decoder<'a> {
pub fn init(&mut self) {
self.type_void = self.ir.emit_type_void();
// Declare scalar types
self.type_u8[1] = self.ir.emit_type_int(8, 0);
self.type_u16[1] = self.ir.emit_type_int(16, 0);
self.type_u32[1] = self.ir.emit_type_int(32, 0);
self.type_u64[1] = self.ir.emit_type_int(64, 0);
self.type_f32[1] = self.ir.emit_type_float(32);
// ... (additional type setup)
// Define generic pointers (storage class 7 = Function)
self.type_ptr_u32 = self.ir.emit_type_pointer(7, self.type_u32[1]);
// Define registers as function-scope variables
for r in self.regs.iter_mut() {
*r = self.ir.emit_variable(self.type_ptr_u32, 7);
}
}
}
Register File
SM86 supports 254 general-purpose registers (R0-R253) plus a special RZ register (R255) that always reads as zero and discards writes.
// core/src/gpu/sm86.rs:241
fn load_reg(&mut self, reg: usize) -> u32 {
if reg == 255 {
// RZ (Zero Register)
return self.ir.emit_constant_typed(self.type_u32[1], 0u32);
}
assert!(reg < self.regs.len(), "Register index out of bounds");
let ptr = self.regs[reg];
self.ir.emit_load(self.type_u32[1], ptr)
}
fn store_reg(&mut self, reg: usize, val: u32) {
if reg == 255 {
// Write to RZ is ignored
return;
}
let ptr = self.regs[reg];
self.ir.emit_store(ptr, val);
}
Instruction Encoding
SM86 instructions are 128-bit (16 bytes) encoded values:
// Instructions take a u128 parameter representing the binary encoding
pub fn al2p(&mut self, inst: u128) {
let pg = (((inst >> 12) & 0x7) << 0); // Predicate guard
let rd = (((inst >> 16) & 0xff) << 0) as usize; // Destination register
let ra = (((inst >> 24) & 0xff) << 0) as usize; // Source register A
let ra_offset = (((inst >> 40) & 0x7ff) << 0); // Immediate offset
// ... decode additional fields
}
Example: AL2P (Add to Pointer)
The AL2P instruction adds an immediate offset to a register:
// core/src/gpu/sm86.rs:265
pub fn al2p(&mut self, inst: u128) {
let rd = (((inst >> 16) & 0xff) << 0) as usize;
let ra = (((inst >> 24) & 0xff) << 0) as usize;
let ra_offset = (((inst >> 40) & 0x7ff) << 0) as usize;
let bop = (((inst >> 74) & 0x3) << 0) as usize;
assert!(ra <= MAX_REG_COUNT || ra == 255);
assert!(bop == BitSize::B32 as usize);
let base = self.load_reg(ra);
let offset = self.ir.emit_constant_typed(self.type_u32[1], ra_offset as u32);
let dst_val = self.ir.emit_iadd(self.type_u32[1], base, offset);
self.store_reg(rd, dst_val);
}
Supported Instructions
The decoder defines 254+ SM86 instructions including:
Memory Instructions
ALD - Attribute Load
AST - Attribute Store
ATOM - Atomic Operation
ATOMG - Global Atomic
ATOMS - Shared Atomic
Arithmetic Instructions
FADD, FADD32I - Floating-point addition
FMUL, FMUL32I - Floating-point multiplication
FFMA, FFMA32I - Fused multiply-add
DADD, DMUL, DFMA - Double-precision operations
Control Flow
BRA - Branch
BRX - Branch indexed
CALL - Function call
EXIT - Shader exit
BREAK - Loop break
Conversion
F2F - Float-to-float conversion
F2I, F2IP - Float-to-integer
I2F, I2FP - Integer-to-float
Texture Operations
TEX - Texture fetch
TLD, TLD4 - Texture load
SUTP - Surface store
Most instruction implementations currently contain todo!() placeholders. The full instruction set is being implemented incrementally.
SPIR-V Translation
SPIR-V Emitter
The spirv::Emitter provides a safe Rust API for building SPIR-V modules:
// core/src/gpu/spirv.rs:172
pub struct Emitter {
words: Vec<u32>,
next_id: u32,
bound_idx: usize,
}
Module Structure
A complete SPIR-V shader follows this structure:
- Header - Magic number, version, ID bound
- Capabilities - Required SPIR-V features
- Extensions - Optional extensions
- Memory Model - Addressing and memory semantics
- Entry Points - Shader entry functions
- Execution Modes - Workgroup size, etc.
- Debug Info - Names and source locations
- Annotations - Decorations (bindings, locations)
- Type Declarations - All types used in shader
- Constants - Constant values
- Global Variables - Uniforms, inputs, outputs
- Function Definitions - Shader code
Example: Building a Function
let mut emitter = spirv::Emitter::new();
emitter.emit_header();
emitter.emit_capability(spirv::capability::SHADER);
emitter.emit_memory_model(0, 1); // Logical, GLSL450
// Define types
let void_ty = emitter.emit_type_void();
let fn_ty = emitter.emit_type_function(void_ty, &[]);
// Create entry point function
let main_fn = emitter.emit_function(void_ty, 0, fn_ty);
let entry_label = emitter.emit_label();
// ... shader logic ...
emitter.emit_return();
emitter.emit_function_end();
emitter.finalize();
Supported SPIR-V Operations
The emitter supports 100+ SPIR-V instructions:
- Arithmetic:
iadd, fadd, imul, fmul, fdiv, etc.
- Logical:
logical_and, logical_or, select
- Comparison:
iequal, ford_less_than, sgreater_than
- Bitwise:
shift_left_logical, bitwise_and, bit_reverse
- Memory:
load, store, access_chain
- Control:
branch, branch_conditional, phi
- Image:
image_sample, image_read, image_write
- Atomic:
atomic_iadd, atomic_exchange, atomic_compare_exchange
Texture and Surface Formats
The GPU module defines extensive format enumerations:
// core/src/gpu/sm86.rs:56
enum SurfaceFormat {
RGBA32_FLOAT = 0x00c0,
RGBA32_SINT = 0x00c1,
RGBA16_UNORM = 0x00c6,
RGBA8_UNORM = 0x00d5,
BGRA8_UNORM = 0x00cf,
R32_FLOAT = 0x00e5,
// ... 50+ formats
}
// core/src/gpu/sm86.rs:135
enum ImageFormat {
RGBA32_FLOAT = 0x02,
RGBA16_FLOAT = 0x0c,
RGBA8_UNORM = 0x18,
RG32_FLOAT = 0x0d,
R32_FLOAT = 0x29,
// ... specialized formats
}
Shader Constants
Hardware limits defined as constants:
// core/src/gpu/sm86.rs:7
static MAX_REG_COUNT: usize = 254;
static MAX_UNIFORM_REG_COUNT: usize = 63;
static MAX_CONST_BANK: usize = 17;
static ALLOW_F16_PARTIAL_WRITES: usize = 1;
Vulkan Backend
The Vulkan backend (via ash crate) handles:
- Instance Creation: Vulkan 1.0+ initialization
- Device Selection: Picking suitable GPU
- Pipeline Creation: Compiling SPIR-V shaders
- Command Submission: Recording and executing GPU work
Future Pipeline
SM86 Shader → Decoder → SPIR-V IR → Vulkan Pipeline → GPU Execution
↓
Shader Specialization
(constant folding, etc.)
Planned Optimizations
- Shader Caching: Cache translated SPIR-V to avoid re-translation
- Specialization Constants: Use SPIR-V spec constants for dynamic values
- Dead Code Elimination: Remove unused registers and instructions
- Register Allocation: Optimize SPIR-V register usage
- Instruction Combining: Merge common patterns (e.g., MAD → FMA)
Debugging Support
SPIR-V Validation
// core/src/gpu/spirv.rs:1089
pub fn validate(&self) -> Result<(), &'static str> {
if self.words.len() < 5 {
return Err("Module too short for valid header");
}
if self.words[0] != 0x07230203 {
return Err("Invalid SPIR-V magic number");
}
// Walk instructions and verify structure
// ...
}
Binary Export
// core/src/gpu/spirv.rs:1122
pub fn to_bytes(&self) -> Vec<u8> {
let mut out = Vec::with_capacity(self.words.len() * 4);
for &w in &self.words {
out.extend_from_slice(&w.to_le_bytes());
}
out
}
Exported SPIR-V can be validated with spirv-val and disassembled with spirv-dis.
Testing
GPU tests verify decoder and emitter functionality:
// core/src/gpu/test.rs
#[test]
fn test_sm86_decoder() {
let mut ir = spirv::Emitter::new();
let mut decoder = sm86::Decoder { ir: &mut ir, /* ... */ };
decoder.init();
// Test instruction decoding
let inst: u128 = /* ... encoded instruction ... */;
decoder.al2p(inst);
ir.finalize();
assert!(ir.validate().is_ok());
}
Future Enhancements
- Complete Instruction Set: Implement all 254+ SM86 instructions
- Geometry Shaders: Support for geometry and tessellation stages
- Compute Shaders: Full compute pipeline with shared memory
- Ray Tracing: RTX operations if Switch 2 supports RT cores
- Performance Counters: GPU profiling and metrics