The Machine class (defined in problem.py) is the core simulator for a custom VLIW SIMD architecture designed for parallel computation workloads.
problem.py
class Machine: """ Simulator for a custom VLIW SIMD architecture. VLIW (Very Large Instruction Word): Cores are composed of different "engines" each of which can execute multiple "slots" per cycle in parallel. """ def __init__(self, mem_dump, program, debug_info, n_cores=1, scratch_size=SCRATCH_SIZE, trace=False): self.cores = [Core(id=i, scratch=[0] * scratch_size, trace_buf=[]) for i in range(n_cores)] self.mem = copy(mem_dump) self.program = program self.cycle = 0
The current version uses N_CORES = 1, though the architecture supports multiple cores.
The main execution loop processes all cores until they stop:
problem.py
def run(self): # Resume paused cores for core in self.cores: if core.state == CoreState.PAUSED: core.state = CoreState.RUNNING # Execute until all cores stop while any(c.state == CoreState.RUNNING for c in self.cores): has_non_debug = False for core in self.cores: if core.state != CoreState.RUNNING: continue if core.pc >= len(self.program): core.state = CoreState.STOPPED continue instr = self.program[core.pc] core.pc += 1 self.step(instr, core) if any(name != "debug" for name in instr.keys()): has_non_debug = True if has_non_debug: self.cycle += 1
Debug instructions do not increment the cycle counter, allowing you to add debugging without affecting performance measurements.
Each instruction bundle executes all engines in parallel:
problem.py
def step(self, instr: Instruction, core): ENGINE_FNS = { "alu": self.alu, "valu": self.valu, "load": self.load, "store": self.store, "flow": self.flow, } self.scratch_write = {} self.mem_write = {} # Execute all engine slots for name, slots in instr.items(): assert len(slots) <= SLOT_LIMITS[name] for i, slot in enumerate(slots): ENGINE_FNS[name](core, *slot) # Apply writes atomically at end of cycle for addr, val in self.scratch_write.items(): core.scratch[addr] = val for addr, val in self.mem_write.items(): self.mem[addr] = val
Critical: All writes are buffered and applied atomically at the end of the cycle. Instructions read the state from the beginning of the cycle, preventing read-after-write hazards within a single instruction bundle.