The Xenos Chip
The Xenos was a revolutionary GPU for its time (2005):- Architecture: AMD R5xx-based unified shader architecture
- Shader Processors: 48 unified shader processors (240 shader operations/cycle)
- Clock Speed: 500 MHz
- Memory: 10MB embedded DRAM (EDRAM) on-die
- Memory Bandwidth: 256 GB/s to EDRAM, 22.4 GB/s to main RAM
- DirectX: DirectX 9.0c with some DirectX 10 features
Unified Shader Architecture
Unlike previous GPUs with separate vertex and pixel shader units, the Xenos used unified shader processors that could execute any type of shader. This improved resource utilization and was a precursor to modern GPU architectures.EDRAM
The 10MB of embedded DRAM sat on the same package as the GPU die:- Stored render targets and depth buffers
- Performed hardware anti-aliasing (2x, 4x MSAA)
- Hardware resolve operations (copy from EDRAM to main RAM)
- Tiling for framebuffers larger than 10MB
Command Processing
The Xenos processes commands from a ringbuffer in system memory, similar to modern GPUs.Command Ringbuffer
The DirectX driver writes commands to a ringbuffer:- CPU writes PM4 commands and data to ringbuffer in system memory
- CPU updates write pointer register on GPU
- GPU’s command processor fetches commands from ringbuffer
- GPU executes commands and updates read pointer
Command Processor
Implementation:src/xenia/gpu/command_processor.cc
The command processor:
- Runs on a dedicated thread, mimicking hardware behavior
- Fetches and decodes PM4 packets from the ringbuffer
- Executes commands (state changes, draws, memory operations)
- Manages GPU registers via
RegisterFile - Synchronizes with CPU via events and interrupts
src/xenia/gpu/command_processor.h.
PM4 Commands
PM4 (Packet Manager version 4) is the command format used by AMD GPUs. Commands are documented insrc/xenia/gpu/xenos.h:521.
Common Commands:
- PM4_DRAW_INDX - Draw primitives with index buffer
- PM4_SET_CONSTANT - Set shader constants
- PM4_LOAD_SHADER - Load vertex/pixel shader
- PM4_SET_CONTEXT_REG - Set context register
- PM4_WAIT_REG_MEM - Wait for memory/register condition
- PM4_EVENT_WRITE - Signal completion event
Register File
GPU state is managed through memory-mapped registers (src/xenia/gpu/register_table.inc):
- Context registers - Per-draw state (blend, depth, rasterizer)
- Shader registers - Shader bindings and constants
- Control registers - Command processor state
Shader Translation
Xenos shaders use a custom microcode format that must be translated to modern shader languages.Shader Microcode
Xenos microcode is a binary format with:- Vertex shaders - Transform vertices and output attributes
- Pixel shaders - Compute per-pixel colors and depth
- Instruction set - Similar to R5xx/R6xx AMD GPUs
- Control flow - Loops, branches, subroutine calls
Translation Pipeline
- Disassembly - Decode binary microcode to readable assembly
- Analysis - Identify inputs, outputs, control flow, texture fetches
- Translation - Convert to SPIR-V (Vulkan) or DXBC (D3D12)
- Compilation - Driver compiles to native GPU code
- Vulkan: Translates to SPIR-V
- D3D12: Translates to DXBC (DirectX Bytecode)
Shader Caching
Shaders are cached to avoid retranslating:- Microcode hash is used as cache key
- Translated shaders are stored on disk
- Cache is loaded on startup
- Reduces shader compilation stutter
EDRAM Emulation
The 10MB EDRAM presented a unique challenge for emulation.EDRAM Layout
EDRAM is organized as:- 2048 tiles of 5120 bytes each (10MB total)
- Each tile is 80x16 pixels at 32-bit color
- Games configure EDRAM via render target registers
- Multiple render targets can be bound simultaneously
Emulation Strategies
Approach 1: Resolve-based- Render to EDRAM-sized buffer (10MB)
- Resolve (copy) to system RAM when game requests it
- System RAM textures used for display and sampling
- Use modern GPU’s memory as virtual EDRAM
- Render targets live in GPU memory
- Resolve operations copy between GPU textures
Tiling
Games rendering at resolutions larger than 10MB use tiling:- Divide framebuffer into tiles that fit in EDRAM
- Render each tile separately
- Resolve each tile to system RAM
- Combine tiles to form complete framebuffer
- Detecting when games use tiling
- Rendering each tile to separate render targets
- Combining tiles for display
Graphics Backends
Xenia supports multiple graphics backends:Vulkan Backend
The Vulkan backend (src/xenia/gpu/vulkan/) provides:
- SPIR-V shader translation
- Explicit resource management
- Better performance on Linux and some Windows configurations
- More manual control over GPU operations
src/xenia/gpu/vulkan/vulkan_gpu_flags.cc
--vulkan_dump_disasm=true- Dump shader disassembly (NVIDIA only)
D3D12 Backend
The D3D12 backend (src/xenia/gpu/d3d12/) provides:
- DXBC shader translation
- Better compatibility on Windows
- PIX integration for debugging
- DirectX 12 feature level 11_0+
GPU Options
Fromsrc/xenia/gpu/gpu_flags.cc:
--vsync=false- Render as fast as possible instead of 60Hz--dump_shaders=path/- Dump all translated shaders to disk--trace_gpu_prefix=path/- Capture GPU traces for debugging--trace_gpu_stream- Record all frames to trace file
Tools
Shader Compiler
xe-gpu-shader-compiler is a standalone tool for shader translation:
- Testing shader translation
- Debugging shader issues
- Analyzing microcode format
.bin.vs for vertex shaders, .bin.ps for pixel shaders.
Shader Playground
GUI tool for interactive shader work (tools/shader-playground/):
Features:
- Assemble shader microcode
- Disassemble binary shaders
- Translate to target language (SPIR-V, etc.)
- Validate translation correctness
- Compare against reference output
tools/shader-playground/README.md for setup instructions.
GPU Trace Viewer
xe-gpu-trace-viewer allows frame capture and inspection:
Workflow:
- Run game with
--trace_gpu_prefix=path/frame_ - Press F4 to capture current frame
- Open trace file in
xe-gpu-trace-viewer - Inspect draw calls, state, shaders, textures
- Modify code and rebuild to test fixes
--trace_gpu_stream to capture all frames to a single file. Warning: files get very large.
Performance Counters
The GPU exposes performance counters that D3D can query. These are 64-bit values with HIGH/LOW/SELECT registers. Counters documented insrc/xenia/gpu/xenos.h and original docs include:
- CP_PERFCOUNTER0 - Command processor metrics
- RBBM_PERFCOUNTER0/1 - Resource block metrics
- SQ_PERFCOUNTER0-3 - Sequencer (shader) metrics
- VGT_PERFCOUNTER0-3 - Vertex Grouper/Tessellator metrics
- And many more (see
docs/gpu.mdfor full list)
Challenges and Limitations
EDRAM Tiling
Detecting and handling tiling is complex:- Games don’t explicitly declare tiling usage
- Must infer from render target configuration
- Incorrect tiling detection causes rendering bugs
Shader Accuracy
Xenos microcode doesn’t map 1:1 to modern shaders:- Some instructions have no direct equivalent
- Precision differences between GPUs
- Control flow handling differs
- Texture sampling behavior varies
Synchronization
Games expect certain GPU/CPU synchronization patterns:- Wait for GPU events before reading results
- Memory coherency between CPU and GPU
- Command buffer execution order
References
Xenos Architecture
Shader Formats
- LLVM R600 Tables - Opcode reference (formats differ but names/semantics match)
- xemit - Alternative shader emitter
