GPU Architecture

The Xenia GPU subsystem emulates the Xbox 360’s custom AMD Xenos graphics chip. The Xenos was based on the R5xx architecture with unified shader processors and integrated 10MB of EDRAM for high-bandwidth framebuffer operations.

The Xenos Chip

The Xenos was a revolutionary GPU for its time (2005):

Architecture: AMD R5xx-based unified shader architecture
Shader Processors: 48 unified shader processors (240 shader operations/cycle)
Clock Speed: 500 MHz
Memory: 10MB embedded DRAM (EDRAM) on-die
Memory Bandwidth: 256 GB/s to EDRAM, 22.4 GB/s to main RAM
DirectX: DirectX 9.0c with some DirectX 10 features

Unified Shader Architecture

Unlike previous GPUs with separate vertex and pixel shader units, the Xenos used unified shader processors that could execute any type of shader. This improved resource utilization and was a precursor to modern GPU architectures.

EDRAM

The 10MB of embedded DRAM sat on the same package as the GPU die:

Stored render targets and depth buffers
Performed hardware anti-aliasing (2x, 4x MSAA)
Hardware resolve operations (copy from EDRAM to main RAM)
Tiling for framebuffers larger than 10MB

EDRAM was crucial for achieving 720p rendering with anti-aliasing at acceptable framerates.

Command Processing

The Xenos processes commands from a ringbuffer in system memory, similar to modern GPUs.

Command Ringbuffer

The DirectX driver writes commands to a ringbuffer:

CPU writes PM4 commands and data to ringbuffer in system memory
CPU updates write pointer register on GPU
GPU’s command processor fetches commands from ringbuffer
GPU executes commands and updates read pointer

This asynchronous design allows CPU and GPU to work in parallel.

Command Processor

Implementation: src/xenia/gpu/command_processor.cc The command processor:

Runs on a dedicated thread, mimicking hardware behavior
Fetches and decodes PM4 packets from the ringbuffer
Executes commands (state changes, draws, memory operations)
Manages GPU registers via RegisterFile
Synchronizes with CPU via events and interrupts

Key Components:

class CommandProcessor {
  // Ring buffer for reading commands
  RingBuffer reader_;
  
  // Active shaders
  Shader* active_vertex_shader_;
  Shader* active_pixel_shader_;
  
  // GPU register state
  RegisterFile* register_file_;
  
  // Graphics backend (Vulkan, D3D12)
  // Performs actual rendering
};

From src/xenia/gpu/command_processor.h.

PM4 Commands

PM4 (Packet Manager version 4) is the command format used by AMD GPUs. Commands are documented in src/xenia/gpu/xenos.h:521. Common Commands:

PM4_DRAW_INDX - Draw primitives with index buffer
PM4_SET_CONSTANT - Set shader constants
PM4_LOAD_SHADER - Load vertex/pixel shader
PM4_SET_CONTEXT_REG - Set context register
PM4_WAIT_REG_MEM - Wait for memory/register condition
PM4_EVENT_WRITE - Signal completion event

Each command is a packet with an opcode and optional data payload.

Register File

GPU state is managed through memory-mapped registers (src/xenia/gpu/register_table.inc):

Context registers - Per-draw state (blend, depth, rasterizer)
Shader registers - Shader bindings and constants
Control registers - Command processor state

Registers are accessed via PM4 commands and are documented in the register table.

Shader Translation

Xenos shaders use a custom microcode format that must be translated to modern shader languages.

Shader Microcode

Xenos microcode is a binary format with:

Vertex shaders - Transform vertices and output attributes
Pixel shaders - Compute per-pixel colors and depth
Instruction set - Similar to R5xx/R6xx AMD GPUs
Control flow - Loops, branches, subroutine calls

Translation Pipeline

Disassembly - Decode binary microcode to readable assembly
Analysis - Identify inputs, outputs, control flow, texture fetches
Translation - Convert to SPIR-V (Vulkan) or DXBC (D3D12)
Compilation - Driver compiles to native GPU code

Translator Implementations:

Vulkan: Translates to SPIR-V
D3D12: Translates to DXBC (DirectX Bytecode)

Both backends share common shader analysis logic.

Shader Caching

Shaders are cached to avoid retranslating:

Microcode hash is used as cache key
Translated shaders are stored on disk
Cache is loaded on startup
Reduces shader compilation stutter

EDRAM Emulation

The 10MB EDRAM presented a unique challenge for emulation.

EDRAM Layout

EDRAM is organized as:

2048 tiles of 5120 bytes each (10MB total)
Each tile is 80x16 pixels at 32-bit color
Games configure EDRAM via render target registers
Multiple render targets can be bound simultaneously

Emulation Strategies

Approach 1: Resolve-based

Render to EDRAM-sized buffer (10MB)
Resolve (copy) to system RAM when game requests it
System RAM textures used for display and sampling

Approach 2: Virtual EDRAM

Use modern GPU’s memory as virtual EDRAM
Render targets live in GPU memory
Resolve operations copy between GPU textures

Xenia uses a hybrid approach depending on the scenario.

Tiling

Games rendering at resolutions larger than 10MB use tiling:

Divide framebuffer into tiles that fit in EDRAM
Render each tile separately
Resolve each tile to system RAM
Combine tiles to form complete framebuffer

Emulating tiling requires:

Detecting when games use tiling
Rendering each tile to separate render targets
Combining tiles for display

Graphics Backends

Xenia supports multiple graphics backends:

Vulkan Backend

The Vulkan backend (src/xenia/gpu/vulkan/) provides:

SPIR-V shader translation
Explicit resource management
Better performance on Linux and some Windows configurations
More manual control over GPU operations

Configuration: src/xenia/gpu/vulkan/vulkan_gpu_flags.cc

--vulkan_dump_disasm=true - Dump shader disassembly (NVIDIA only)

D3D12 Backend

The D3D12 backend (src/xenia/gpu/d3d12/) provides:

DXBC shader translation
Better compatibility on Windows
PIX integration for debugging
DirectX 12 feature level 11_0+

GPU Options

From src/xenia/gpu/gpu_flags.cc:

--vsync=false - Render as fast as possible instead of 60Hz
--dump_shaders=path/ - Dump all translated shaders to disk
--trace_gpu_prefix=path/ - Capture GPU traces for debugging
--trace_gpu_stream - Record all frames to trace file

Tools

Shader Compiler

xe-gpu-shader-compiler is a standalone tool for shader translation:

xe-gpu-shader-compiler \
    --shader_input=input_file.bin.vs \
    --shader_output=output_file.txt \
    --shader_output_type=spirvtext

Useful for:

Testing shader translation
Debugging shader issues
Analyzing microcode format

Binaries use .bin.vs for vertex shaders, .bin.ps for pixel shaders.

Shader Playground

GUI tool for interactive shader work (tools/shader-playground/): Shader Playground

Features:

Assemble shader microcode
Disassemble binary shaders
Translate to target language (SPIR-V, etc.)
Validate translation correctness
Compare against reference output

See tools/shader-playground/README.md for setup instructions.

GPU Trace Viewer

xe-gpu-trace-viewer allows frame capture and inspection: Workflow:

Run game with --trace_gpu_prefix=path/frame_
Press F4 to capture current frame
Open trace file in xe-gpu-trace-viewer
Inspect draw calls, state, shaders, textures
Modify code and rebuild to test fixes

Capturing Streams: Use --trace_gpu_stream to capture all frames to a single file. Warning: files get very large.

Performance Counters

The GPU exposes performance counters that D3D can query. These are 64-bit values with HIGH/LOW/SELECT registers. Counters documented in src/xenia/gpu/xenos.h and original docs include:

CP_PERFCOUNTER0 - Command processor metrics
RBBM_PERFCOUNTER0/1 - Resource block metrics
SQ_PERFCOUNTER0-3 - Sequencer (shader) metrics
VGT_PERFCOUNTER0-3 - Vertex Grouper/Tessellator metrics
And many more (see docs/gpu.md for full list)

Emulating these is low priority but may be needed for some games.

Challenges and Limitations

EDRAM Tiling

Detecting and handling tiling is complex:

Games don’t explicitly declare tiling usage
Must infer from render target configuration
Incorrect tiling detection causes rendering bugs

Shader Accuracy

Xenos microcode doesn’t map 1:1 to modern shaders:

Some instructions have no direct equivalent
Precision differences between GPUs
Control flow handling differs
Texture sampling behavior varies

Synchronization

Games expect certain GPU/CPU synchronization patterns:

Wait for GPU events before reading results
Memory coherency between CPU and GPU
Command buffer execution order

Emulating these correctly is critical for stability.

References

Xenos Architecture

Shader Formats

LLVM R600 Tables - Opcode reference (formats differ but names/semantics match)
xemit - Alternative shader emitter

Get Started

Building

Architecture

GPU Tools

Development

Reference

GPU Architecture

The Xenos Chip

Unified Shader Architecture

EDRAM

Command Processing

Command Ringbuffer

Command Processor

PM4 Commands

Register File

Shader Translation

Shader Microcode

Translation Pipeline

Shader Caching

EDRAM Emulation

EDRAM Layout

Emulation Strategies

Tiling

Graphics Backends

Vulkan Backend

D3D12 Backend

GPU Options

Tools

Shader Compiler

Shader Playground

GPU Trace Viewer

Performance Counters

Challenges and Limitations

EDRAM Tiling

Shader Accuracy

Synchronization

References

Xenos Architecture

Shader Formats

Build docs developers (and LLMs) love

Get Started

Building

Architecture

GPU Tools

Development

Reference

​The Xenos Chip

​Unified Shader Architecture

​EDRAM

​Command Processing

​Command Ringbuffer

​Command Processor

​PM4 Commands

​Register File

​Shader Translation

​Shader Microcode

​Translation Pipeline

​Shader Caching

​EDRAM Emulation

​EDRAM Layout

​Emulation Strategies

​Tiling

​Graphics Backends

​Vulkan Backend

​D3D12 Backend

​GPU Options

​Tools

​Shader Compiler

​Shader Playground

​GPU Trace Viewer

​Performance Counters

​Challenges and Limitations

​EDRAM Tiling

​Shader Accuracy

​Synchronization

​References

​Xenos Architecture

​Shader Formats

Build docs developers (and LLMs) love

The Xenos Chip

Unified Shader Architecture

EDRAM

Command Processing

Command Ringbuffer

Command Processor

PM4 Commands

Register File

Shader Translation

Shader Microcode

Translation Pipeline

Shader Caching

EDRAM Emulation

EDRAM Layout

Emulation Strategies

Tiling

Graphics Backends

Vulkan Backend

D3D12 Backend

GPU Options

Tools

Shader Compiler

Shader Playground

GPU Trace Viewer

Performance Counters

Challenges and Limitations

EDRAM Tiling

Shader Accuracy

Synchronization

References

Xenos Architecture

Shader Formats