Skip to main content

Overview

Oboromi emulates the Nintendo Switch 2’s CPU using the Unicorn Engine, a lightweight CPU emulation framework based on QEMU. The system supports 8 ARMv8 cores with a unified 12GB shared memory space.
The Switch 2 uses NVIDIA’s custom Tegra processor with ARM Cortex cores. Oboromi emulates the ARMv8-A instruction set in little-endian mode.

Architecture Components

CPU Manager (cpu_manager.rs)

The CpuManager struct orchestrates all CPU cores and manages the shared memory:
// core/src/cpu/cpu_manager.rs:12
pub struct CpuManager {
    pub cores: Vec<UnicornCPU>,
    // Pin prevents reallocation from invalidating pointers
    pub shared_memory: Pin<Box<[u8]>>,
}

Key Constants

pub const CORE_COUNT: usize = 8;
pub const MEMORY_SIZE: u64 = 12 * 1024 * 1024 * 1024; // 12GB
pub const MEMORY_BASE: u64 = 0x0;

Multicore Initialization

The CPU manager creates 8 independent Unicorn instances that all map to the same physical memory buffer:
// core/src/cpu/cpu_manager.rs:20
impl CpuManager {
    pub fn new() -> Self {
        // Allocate 12GB of zeroed memory
        let shared_memory = Pin::new(vec![0u8; MEMORY_SIZE as usize].into_boxed_slice());
        let memory_ptr = shared_memory.as_ptr() as *mut u8;

        let mut cores = Vec::with_capacity(CORE_COUNT);

        for i in 0..CORE_COUNT {
            // Create CPU core sharing the same memory pointer
            let cpu = unsafe { 
                UnicornCPU::new_with_shared_mem(i as u32, memory_ptr, MEMORY_SIZE) 
            };
            
            if let Some(cpu) = cpu {
                cores.push(cpu);
            } else {
                panic!("Failed to create Core {}", i);
            }
        }

        Self { cores, shared_memory }
    }
}
Memory is pinned using Pin<Box<[u8]>> to guarantee that the pointer passed to Unicorn remains valid for the entire program lifetime. This prevents Rust from reallocating or moving the buffer.

Unicorn Integration

UnicornCPU Wrapper (unicorn_interface.rs)

The UnicornCPU struct provides a safe Rust interface to Unicorn Engine:
// core/src/cpu/unicorn_interface.rs:5
pub struct UnicornCPU {
    emu: Arc<Mutex<Unicorn<'static, ()>>>,
    pub core_id: u32,
}

Shared Memory Mapping

Each core maps the same host memory buffer into its address space:
// core/src/cpu/unicorn_interface.rs:43
pub unsafe fn new_with_shared_mem(
    core_id: u32, 
    memory_ptr: *mut u8, 
    memory_size: u64
) -> Option<Self> {
    let mut emu = Unicorn::new(Arch::ARM64, Mode::LITTLE_ENDIAN)
        .map_err(|e| {
            eprintln!("Failed to create Unicorn instance for core {}: {:?}", core_id, e);
            e
        })
        .ok()?;

    // Map shared memory using Unicorn's mem_map_ptr
    unsafe {
        emu.mem_map_ptr(
            0x0, 
            memory_size, 
            Prot::ALL, 
            memory_ptr as *mut std::ffi::c_void
        )
        .ok()?;
    }

    // Initialize stack pointer, offset by core ID to avoid collision
    let stack_top = memory_size - (core_id as u64 * 0x100000);
    let _ = emu.reg_write(RegisterARM64::SP, stack_top);

    Some(Self {
        emu: Arc::new(Mutex::new(emu)),
        core_id,
    })
}

Memory Layout

Stack Allocation

Each core receives 1MB of stack space at the top of the address space:
0x0000_0000_0000                    ← Memory Base

│  General Purpose Memory
│  (~12GB - 8MB for stacks)

├─ 0x2_FF00_0000                    ← Core 7 Stack Top
│  Core 7 Stack (1MB)
├─ 0x2_FE00_0000                    ← Core 6 Stack Top
│  Core 6 Stack (1MB)
├─ 0x2_FD00_0000                    ← Core 5 Stack Top
│  ...
├─ 0x2_F900_0000                    ← Core 1 Stack Top
│  Core 1 Stack (1MB)
├─ 0x2_F800_0000                    ← Core 0 Stack Top
│  Core 0 Stack (1MB)
└─ 0x3_0000_0000                    ← End of Memory (12GB)

Stack Top Calculation

// core/src/cpu/unicorn_interface.rs:64
let stack_top = memory_size - (core_id as u64 * 0x100000);
This ensures each core has isolated stack space while sharing the same address space for heap/data.

Register Access

The wrapper provides safe methods to read/write ARMv8 general-purpose registers:

Reading Registers

// core/src/cpu/unicorn_interface.rs:111
pub fn get_x(&self, reg_index: u32) -> u64 {
    let emu = self.emu.lock().unwrap();
    if reg_index > 30 {
        return 0;
    }

    let reg = match reg_index {
        0 => RegisterARM64::X0,
        1 => RegisterARM64::X1,
        // ... (X0-X30 mapping)
        30 => RegisterARM64::X30,
        _ => return 0,
    };

    emu.reg_read(reg).unwrap_or(0)
}

Writing Registers

pub fn set_x(&self, reg_index: u32, value: u64) {
    let mut emu = self.emu.lock().unwrap();
    // ... similar mapping logic
    let _ = emu.reg_write(reg, value);
}

Special Registers

Dedicated accessors for critical registers:
  • get_pc() / set_pc(u64) - Program Counter
  • get_sp() / set_sp(u64) - Stack Pointer

Memory Operations

Direct memory read/write through Unicorn:
// core/src/cpu/unicorn_interface.rs:225
pub fn write_u32(&self, vaddr: u64, value: u32) {
    let mut emu = self.emu.lock().unwrap();
    let bytes = value.to_le_bytes();
    let _ = emu.mem_write(vaddr, &bytes);
}

pub fn read_u32(&self, vaddr: u64) -> u32 {
    let emu = self.emu.lock().unwrap();
    let mut bytes = [0u8; 4];
    if emu.mem_read(vaddr, &mut bytes).is_ok() {
        u32::from_le_bytes(bytes)
    } else {
        0
    }
}

Execution Models

Single-Step Execution

// core/src/cpu/unicorn_interface.rs:94
pub fn step(&self) -> u64 {
    let mut emu = self.emu.lock().unwrap();
    let pc = emu.reg_read(RegisterARM64::PC).unwrap_or(0);

    match emu.emu_start(pc, pc + 4, 0, 1) {
        Ok(_) => 0,
        Err(_) => 1,
    }
}

Continuous Execution

// core/src/cpu/unicorn_interface.rs:74
pub fn run(&self) -> u64 {
    let mut emu = self.emu.lock().unwrap();
    let pc = emu.reg_read(RegisterARM64::PC).unwrap_or(0);

    // Run until we hit a BRK instruction or error
    match emu.emu_start(pc, 0xFFFF_FFFF_FFFF_FFFF, 0, 0) {
        Ok(_) => 1,
        Err(e) => {
            if format!("{e:?}").contains("EXCEPTION") {
                1 // Success - terminated by BRK
            } else {
                eprintln!("Emulation error: {e:?}");
                0
            }
        }
    }
}

Core Scheduling

Current implementation uses sequential round-robin execution:
// core/src/cpu/cpu_manager.rs:48
pub fn run_all(&self) {
    // for now, just step all cores sequentially (round-robin)
    // in the future, this would be threaded
    for (_i, core) in self.cores.iter().enumerate() {
        core.step();
    }
}
True parallel execution is planned for future versions. This will require synchronization primitives to handle memory barriers and atomic operations correctly.

Thread Safety

The UnicornCPU struct implements Send and Sync for concurrent access:
// core/src/cpu/unicorn_interface.rs:264
unsafe impl Send for UnicornCPU {}
unsafe impl Sync for UnicornCPU {}
Each core’s Unicorn instance is protected by an Arc<Mutex<...>> wrapper, allowing safe multi-threaded access.

Testing

Multicore Memory Test

The test suite verifies shared memory functionality:
// core/src/tests/multicore_test.rs:15
#[test]
fn test_shared_memory_access() {
    let manager = CpuManager::new();
    
    let core0 = manager.get_core(0).expect("Core 0 missing");
    let core1 = manager.get_core(1).expect("Core 1 missing");

    // Write value using Core 0
    let test_addr = 0x1000;
    let test_val = 0xDEADBEEF;
    core0.write_u32(test_addr, test_val);

    // Read value using Core 1
    let read_val = core1.read_u32(test_addr);

    assert_eq!(read_val, test_val, 
        "Core 1 should see value written by Core 0");
}
This confirms that all cores share the same memory view.

Performance Considerations

Virtual Memory

The 12GB allocation uses lazy virtual memory. On Linux/macOS/Windows, the OS only commits physical RAM when pages are actually written. An untouched emulator consumes minimal RAM despite the large allocation.

64-bit Requirement

// core/src/cpu/cpu_manager.rs:6
#[cfg(not(target_pointer_width = "64"))]
compile_error!("oboromi requires a 64-bit architecture to emulate 12GB of RAM.");
This compile-time check ensures the host system can address the full memory space.

Future Enhancements

  1. Parallel Execution: Spawn 8 OS threads for true multicore emulation
  2. Memory Barriers: Implement ARMv8 memory ordering semantics
  3. TLB Emulation: Add virtual-to-physical address translation
  4. Cache Simulation: Model L1/L2/L3 cache behavior for timing accuracy
  5. Exception Handling: Proper interrupt and exception routing

Build docs developers (and LLMs) love