Overview
Oboromi’s GPU emulation layer translates NVIDIA SM86 (Ada Lovelace/Ampere) shader instructions into SPIR-V for execution on Vulkan-compatible hardware. This enables host GPU acceleration without requiring NVIDIA hardware.
Architecture
The GPU emulation consists of three main components:
- SM86 Decoder - Parses 128-bit SASS instructions into structured data
- SPIR-V Emitter - Generates valid SPIR-V binary modules
- Vulkan State - Manages Vulkan instance and execution context
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ SASS Binary │─────▶│ SM86 Decoder │─────▶│ SPIR-V │
│ (128-bit) │ │ (Rust) │ │ Emitter │
└─────────────┘ └──────────────┘ └─────────────┘
│
▼
┌─────────────┐
│ Vulkan │
│ Runtime │
└─────────────┘
SM86 Instruction Decoder
The decoder (defined in core/src/gpu/sm86.rs) is a code-generated parser for NVIDIA’s SASS instruction format.
Decoder Structure
pub struct Decoder<'a> {
pub ir: &'a mut spirv::Emitter,
type_void: u32,
type_ptr_u32: u32,
// Type IDs for different bit widths and components
type_u8: [u32; 5],
type_u16: [u32; 5],
type_u32: [u32; 5],
type_u64: [u32; 5],
type_s8: [u32; 5],
type_s16: [u32; 5],
type_s32: [u32; 5],
type_s64: [u32; 5],
type_f16: [u32; 5],
type_f32: [u32; 5],
type_f64: [u32; 5],
type_bool: [u32; 5],
// Virtual register file (254 registers)
regs: [u32; MAX_REG_COUNT], // MAX_REG_COUNT = 254
}
Initialization
The decoder pre-allocates all SPIR-V types and registers:
pub fn init(&mut self) {
self.type_void = self.ir.emit_type_void();
self.type_u8[1] = self.ir.emit_type_int(8, 0);
self.type_u16[1] = self.ir.emit_type_int(16, 0);
self.type_u32[1] = self.ir.emit_type_int(32, 0);
self.type_u64[1] = self.ir.emit_type_int(64, 0);
self.type_f32[1] = self.ir.emit_type_float(32);
// ... more type declarations ...
// Create vector types (2, 3, 4 components)
for i in 2..=4 {
for type_sxx in [
self.type_u8, self.type_u16, self.type_u32, self.type_u64,
self.type_s8, self.type_s16, self.type_s32, self.type_s64,
self.type_f16, self.type_f32, self.type_f64, self.type_bool
] {
self.ir.emit_type_vector(type_sxx[i], i as u32);
}
}
// Define generic pointers (storage class 7 = Function)
self.type_ptr_u32 = self.ir.emit_type_pointer(7, self.type_u32[1]);
// Allocate register variables
for r in self.regs.iter_mut() {
*r = self.ir.emit_variable(self.type_ptr_u32, 7);
}
}
The type arrays use index 0 as unused - scalar types start at index 1, vectors at indices 2-4.
Register File Operations
Zero Register (RZ)
Register 255 is the special zero register:
fn load_reg(&mut self, reg: usize) -> u32 {
if reg == 255 {
// RZ (Zero Register) - always reads as 0
return self.ir.emit_constant_typed(self.type_u32[1], 0u32);
}
assert!(reg < self.regs.len(), "Register index out of bounds");
let ptr = self.regs[reg];
self.ir.emit_load(self.type_u32[1], ptr)
}
fn store_reg(&mut self, reg: usize, val: u32) {
if reg == 255 {
// Write to RZ is ignored
return;
}
assert!(reg < self.regs.len(), "Register index out of bounds");
let ptr = self.regs[reg];
self.ir.emit_store(ptr, val);
}
Instruction Example: AL2P
The Address Load 2 Pointer instruction demonstrates the decoding process:
// %rd := %ra + $ra_offset
pub fn al2p(&mut self, inst: u128) {
let _pg = (((inst >> 12) & 0x7) << 0); // Predicate guard
let _pg_not = (((inst >> 15) & 0x1) << 0); // Predicate negate
let rd = (((inst >> 16) & 0xff) << 0) as usize; // Destination register
let ra = (((inst >> 24) & 0xff) << 0) as usize; // Source register
let ra_offset = (((inst >> 40) & 0x7ff) << 0) as usize; // Immediate offset
let bop = (((inst >> 74) & 0x3) << 0) as usize; // Bit operation size
assert!(ra <= MAX_REG_COUNT || ra == 255);
assert!(bop == BitSize::B32 as usize);
// Load source register value
let base = self.load_reg(ra);
// Create constant for offset
let offset = self.ir.emit_constant_typed(self.type_u32[1], ra_offset as u32);
// Emit integer addition: dst = base + offset
let dst_val = self.ir.emit_iadd(self.type_u32[1], base, offset);
// Store to destination register
self.store_reg(rd, dst_val);
}
Bit Field Extraction:
The instruction decoding uses bit manipulation to extract fields from the 128-bit instruction word:
// Pattern: (((inst >> shift) & mask) << output_shift)
let rd = (((inst >> 16) & 0xff) << 0) as usize;
// └──shift─┘ └mask┘ └─0─┘ └─cast─┘
SPIR-V Generation
The SPIR-V emitter (core/src/gpu/spirv.rs) provides a safe Rust API for generating SPIR-V binary modules.
Core Emitter
pub struct Emitter {
words: Vec<u32>, // Output SPIR-V word stream
next_id: u32, // Next available ID (1-based)
bound_idx: usize, // Index of ID bound in header
}
impl Emitter {
pub fn new() -> Self {
Self {
words: Vec::with_capacity(4096),
next_id: 1, // SPIR-V IDs are 1-based; 0 is reserved
bound_idx: 0,
}
}
#[inline]
pub fn alloc_id(&mut self) -> u32 {
let id = self.next_id;
self.next_id += 1;
id
}
}
pub fn emit_header(&mut self) {
self.words.push(0x07230203); // magic
self.words.push(0x00010500); // version 1.5
self.words.push(0); // generator (unregistered)
self.bound_idx = self.words.len();
self.words.push(0); // bound (patched by finalize)
self.words.push(0); // schema
}
pub fn finalize(&mut self) {
if self.bound_idx < self.words.len() {
self.words[self.bound_idx] = self.next_id;
}
}
Always call finalize() after emitting all instructions to patch the ID bound in the header.
Type System
Scalar Types
Vector Types
Pointer Types
// Integers: width = 8|16|32|64, sign = 0 (unsigned) or 1 (signed)
pub fn emit_type_int(&mut self, width: u32, sign: u32) -> u32 {
debug_assert!(width == 8 || width == 16 || width == 32 || width == 64);
debug_assert!(sign <= 1);
let r = self.alloc_id();
self.inst(21, &[r, width, sign]);
r
}
// Floats: width = 16|32|64
pub fn emit_type_float(&mut self, width: u32) -> u32 {
debug_assert!(width == 16 || width == 32 || width == 64);
let r = self.alloc_id();
self.inst(22, &[r, width]);
r
}
// Booleans
pub fn emit_type_bool(&mut self) -> u32 {
let r = self.alloc_id();
self.inst(20, &[r]);
r
}
// Vectors: component_type = scalar type ID, count = 2-16
pub fn emit_type_vector(&mut self, component_type: u32, count: u32) -> u32 {
debug_assert!(count >= 2 && count <= 16);
let r = self.alloc_id();
self.inst(23, &[r, component_type, count]);
r
}
pub fn emit_type_pointer(&mut self, storage_class: u32, pointee_type: u32) -> u32 {
let r = self.alloc_id();
self.inst(32, &[r, storage_class, pointee_type]);
r
}
// Storage classes
pub mod storage_class {
pub const UNIFORM_CONSTANT: u32 = 0;
pub const INPUT: u32 = 1;
pub const UNIFORM: u32 = 2;
pub const OUTPUT: u32 = 3;
pub const WORKGROUP: u32 = 4;
pub const FUNCTION: u32 = 7;
pub const STORAGE_BUFFER: u32 = 12;
}
Arithmetic Operations
// Integer operations
pub fn emit_iadd(&mut self, ty: u32, a: u32, b: u32) -> u32 {
self.typed_bin(128, ty, a, b)
}
pub fn emit_isub(&mut self, ty: u32, a: u32, b: u32) -> u32 {
self.typed_bin(130, ty, a, b)
}
pub fn emit_imul(&mut self, ty: u32, a: u32, b: u32) -> u32 {
self.typed_bin(132, ty, a, b)
}
// Float operations
pub fn emit_fadd(&mut self, ty: u32, a: u32, b: u32) -> u32 {
self.typed_bin(129, ty, a, b)
}
pub fn emit_fmul(&mut self, ty: u32, a: u32, b: u32) -> u32 {
self.typed_bin(133, ty, a, b)
}
// Helper for binary operations
fn typed_bin(&mut self, op: u32, ty: u32, a: u32, b: u32) -> u32 {
let r = self.alloc_id();
self.inst(op, &[ty, r, a, b]);
r
}
Memory Operations
pub fn emit_variable(&mut self, ty: u32, storage_class: u32) -> u32 {
let r = self.alloc_id();
self.inst(59, &[ty, r, storage_class]);
r
}
pub fn emit_variable_init(&mut self, ty: u32, storage_class: u32, initializer: u32) -> u32 {
let r = self.alloc_id();
self.inst(59, &[ty, r, storage_class, initializer]);
r
}
Control Flow
// Basic blocks
pub fn emit_label(&mut self) -> u32 {
let r = self.alloc_id();
self.inst(248, &[r]);
r
}
// Branching
pub fn emit_branch(&mut self, target: u32) {
self.inst(249, &[target]);
}
pub fn emit_branch_conditional(&mut self, cond: u32, true_label: u32, false_label: u32) {
self.inst(250, &[cond, true_label, false_label]);
}
// Function termination
pub fn emit_return(&mut self) {
self.inst(253, &[]);
}
pub fn emit_return_value(&mut self, value: u32) {
self.inst(254, &[value]);
}
Vulkan Integration
The GPU state manages Vulkan context:
pub struct VkState {
pub entry: ash::Entry,
pub instance: ash::Instance,
}
impl VkState {
pub fn init(&mut self) -> ash::prelude::VkResult<()> {
self.entry = unsafe { ash::Entry::load().unwrap() };
self.instance = unsafe {
self.entry.create_instance(&vk::InstanceCreateInfo {
p_application_info: &vk::ApplicationInfo {
api_version: vk::make_api_version(0, 1, 0, 0),
..Default::default()
},
..Default::default()
}, None)?
};
Ok(())
}
}
Texture Formats
The decoder includes comprehensive format enumerations:
enum TextureType {
ONE_D = 0,
TWO_D = 1,
THREE_D = 2,
CUBEMAP = 3,
ONE_D_ARRAY = 4,
TWO_D_ARRAY = 5,
ONE_D_BUFFER = 6,
TWO_D_NO_MIPMAP = 7,
CUBE_ARRAY = 8,
}
Design Decisions
Why SPIR-V?
Portability
SPIR-V is the standard IR for Vulkan, ensuring compatibility across all Vulkan-capable GPUs (NVIDIA, AMD, Intel, Apple).
Validation
SPIR-V has well-defined validation rules and mature tooling (spirv-val, spirv-cross).
Optimization
Driver compilers can optimize SPIR-V effectively, often matching or exceeding native shader performance.
Debugging
SPIR-V tools enable shader debugging and analysis without vendor-specific tools.
Translation Challenges
-
Instruction Set Differences
- SM86 has 300+ unique instructions
- Many map to SPIR-V extended instructions (GLSL.std.450)
- Some require multi-instruction sequences
-
Register Allocation
- SM86: 255 physical registers + RZ
- SPIR-V: Unlimited virtual registers (SSA form)
- Current approach: Pre-allocate 254 SPIR-V variables
-
Predication
- SM86 uses per-instruction predicates
- SPIR-V uses structured control flow
- Translation requires control flow reconstruction
Current Limitations
Most instruction handlers are currently stubbed with todo!() macros:
pub fn ald(&mut self, inst: u128) {
let _pg = (((inst >> 12) & 0x7) << 0);
let _pg_not = (((inst >> 15) & 0x1) << 0);
let _rd = (((inst >> 16) & 0xff) << 0);
// ... field extraction ...
todo!();
}
Priority instruction implementations:
- Memory load/store (ALD, AST, ATOM)
- Arithmetic (FADD, FMUL, FFMA, IADD, IMUL)
- Control flow (BRA, BRX, CALL, EXIT)
- Texture operations (TEX, TLD, SUTP)
Source Files
- SM86 Decoder:
core/src/gpu/sm86.rs:1-1178
- SPIR-V Emitter:
core/src/gpu/spirv.rs:1-1184
- GPU Module:
core/src/gpu/mod.rs:1-62