Graphics Subsystem

Overview

Ryujinx’s graphics subsystem emulates the NVIDIA Tegra X1 GPU, translating Switch graphics commands to host APIs (Vulkan, OpenGL). The architecture is layered:

GAL (Graphics Abstraction Layer) provides a unified API for multiple rendering backends

GPU Context

The central GPU emulation context from src/Ryujinx.Graphics.Gpu/GpuContext.cs:

public sealed class GpuContext : IDisposable
{
    /// <summary>
    /// Host renderer (Vulkan/OpenGL backend)
    /// </summary>
    public IRenderer Renderer { get; }
    
    /// <summary>
    /// GPU General Purpose FIFO queue
    /// </summary>
    public GPFifoDevice GPFifo { get; }
    
    /// <summary>
    /// GPU synchronization manager
    /// </summary>
    public SynchronizationManager Synchronization { get; }
    
    /// <summary>
    /// Presentation window
    /// </summary>
    public Window Window { get; }
    
    /// <summary>
    /// Sequence number for resource versioning
    /// </summary>
    internal int SequenceNumber { get; private set; }
    
    /// <summary>
    /// Registry of physical memories by process ID
    /// </summary>
    internal ConcurrentDictionary<ulong, PhysicalMemory> PhysicalMemoryRegistry { get; }
    
    /// <summary>
    /// Actions to execute on GPU sync points
    /// </summary>
    internal List<ISyncActionHandler> SyncActions { get; }
    internal List<ISyncActionHandler> SyncpointActions { get; }
    
    public GpuContext(IRenderer renderer, DirtyHacks hacks)
    {
        Renderer = renderer;
        GPFifo = new GPFifoDevice(this);
        Synchronization = new SynchronizationManager();
        Window = new Window(this);
        
        PhysicalMemoryRegistry = new ConcurrentDictionary<ulong, PhysicalMemory>();
        SyncActions = [];
        SyncpointActions = [];
    }
}

Key responsibilities:

Command buffer submission and processing
Memory management (texture, buffer, shader storage)
Synchronization with CPU
Multi-process GPU sharing

Command Processing

GPFifo (General Purpose FIFO)

The command submission interface:

public class GPFifoDevice
{
    private readonly GpuContext _context;
    private readonly GPFifoProcessor _processor;
    
    // Command buffer submission
    public void Submit(ReadOnlySpan<ulong> entries)
    {
        foreach (ulong entry in entries)
        {
            ulong gpuVa = entry & 0xFFFFFFFFFF;
            int size = (int)((entry >> 40) & 0x1FFFFF);
            
            ProcessCommands(gpuVa, size);
        }
    }
    
    private void ProcessCommands(ulong gpuVa, int size)
    {
        // Read commands from GPU virtual address
        ReadOnlySpan<int> commands = _context.MemoryManager.Read<int>(gpuVa, size);
        
        // Process command buffer
        _processor.Process(commands);
    }
}

Command Buffer Format

Switch uses pushbuffer-style command submission:

// Command format: [Method | Argument]
// Method: bits 0-12 (8192 methods)
// SubChannel: bits 13-15 (8 subchannels)
// ArgumentCount: bits 16-28
// Type: bits 29-31

public void ProcessMethod(int method, int argument, int subChannel)
{
    switch (subChannel)
    {
        case 0: // 2D Engine
            _engine2d.ProcessMethod(method, argument);
            break;
        case 1: // 3D Engine  
            _engine3d.ProcessMethod(method, argument);
            break;
        case 2: // Compute
            _engineCompute.ProcessMethod(method, argument);
            break;
        case 3: // Inline-to-Memory
            _inlineToMemory.ProcessMethod(method, argument);
            break;
        case 4: // DMA Copy
            _dma.ProcessMethod(method, argument);
            break;
    }
}

2D Engine
3D Engine
Compute Engine
DMA Engine

Handles 2D blits and surface operations:

- Surface copies
- Format conversions
- Rectangular fills
- Block linear <-> linear conversions

Main rendering pipeline:

- Vertex/fragment/geometry shaders
- Rasterization state
- Framebuffer configuration
- Texture sampling
- Draw calls

Compute shader dispatch:

- Compute shader execution
- Shared memory configuration
- Grid/block dimensions

Asynchronous memory copies:

- GPU <-> GPU copies
- Linear <-> block linear
- Pitch linear transfers

Graphics Engine (3D)

The main rendering engine processes graphics state and draw commands:

public class ThreedClass : IDeviceState
{
    private readonly GpuContext _context;
    private readonly GpuChannel _channel;
    private readonly DeviceStateWithShadow<ThreedClassState> _state;
    
    // State management
    public void SetRenderTargets(int count, RenderTarget[] targets) { /* ... */ }
    public void SetViewports(ReadOnlySpan<Viewport> viewports) { /* ... */ }
    public void SetScissors(ReadOnlySpan<Rectangle<int>> scissors) { /* ... */ }
    public void SetVertexBuffers(ReadOnlySpan<VertexBufferState> buffers) { /* ... */ }
    
    // Shader binding
    public void SetShaderStage(ShaderStage stage, ulong gpuVa, bool enable) { /* ... */ }
    
    // Draw commands
    public void Draw(int vertexCount, int instanceCount, int firstVertex, 
                     int firstInstance) { /* ... */ }
    public void DrawIndexed(int indexCount, int instanceCount, int firstIndex,
                           int baseVertex, int firstInstance) { /* ... */ }
}

Render State Management

Graphics state is cached and synchronized:

public class DeviceStateWithShadow<T> where T : unmanaged
{
    private T _hostState;      // Current host state
    private T _shadowState;    // Last written state
    
    public void Update()
    {
        // Compare shadow vs host state
        // Only update changed fields
        if (!MemoryEquals(_hostState, _shadowState))
        {
            UpdateDirtyFields();
            _shadowState = _hostState;
        }
    }
}

State groups:

Rasterizer: Primitive topology, polygon mode, cull mode, front face
Depth/Stencil: Depth test, stencil test, depth bounds
Blend: Blend equations, blend factors, color mask
Viewport: Viewport transforms, depth range
Scissor: Scissor rectangles

Shader Translation

Shaders are translated from Switch binary format to host GLSL/SPIR-V:

Shader Cache

From src/Ryujinx.Graphics.Gpu/Shader/ShaderCache.cs:

class ShaderCache : IDisposable
{
    private readonly GpuContext _context;
    private readonly ShaderDumper _dumper;
    
    // Cached programs
    private readonly Dictionary<ulong, CachedShaderProgram> _cpPrograms;  // Compute
    private readonly Dictionary<ShaderAddresses, CachedShaderProgram> _gpPrograms; // Graphics
    
    // Disk cache
    private readonly ComputeShaderCacheHashTable _computeShaderCache;
    private readonly ShaderCacheHashTable _graphicsShaderCache;
    private readonly DiskCacheHostStorage _diskCacheHostStorage;
    
    public CachedShaderProgram GetGraphicsShader(ShaderAddresses addresses)
    {
        if (_gpPrograms.TryGetValue(addresses, out var program))
        {
            return program;
        }
        
        // Not cached - translate from binary
        program = TranslateGraphicsShader(addresses);
        _gpPrograms[addresses] = program;
        
        return program;
    }
}

Translation Pipeline

Binary Decode

Decode Maxwell/Pascal GPU binary instructions:

// Read shader code from GPU memory
ReadOnlySpan<byte> code = memoryManager.GetSpan(gpuVa, maxSize);

// Decode instruction stream
ShaderDecoder decoder = new ShaderDecoder(code);
List<Block> blocks = decoder.Decode();

Control Flow Analysis

Build control flow graph:

// Identify basic blocks
// Resolve branch targets
// Build dominator tree
ControlFlowGraph cfg = ControlFlowAnalysis.Build(blocks);

Translation to IR

Convert to intermediate representation:

// Lift GPU instructions to IR
// Track register/attribute usage
// Handle texture operations
// Process special functions

Optimization

Optimize shader IR:

// Dead code elimination
// Constant propagation
// Algebraic simplification
// Resource optimization

Code Generation

Generate target shader language:

if (backend == Backend.Vulkan)
{
    // Generate SPIR-V
    byte[] spirv = CodeGen.Spirv.Generate(program);
    return renderer.CompileShader(spirv);
}
else
{
    // Generate GLSL
    string glsl = CodeGen.Glsl.Generate(program);
    return renderer.CompileShader(glsl);
}

Guest-to-Host Feature Mapping

Texture Operations
Compute Features
Vertex Attributes

// Switch texture instructions -> Host equivalents
TEX   -> texture()        // Basic sampling
TEXS  -> textureGather()  // Gather operation  
TLD   -> texelFetch()     // Load without filtering
TLD4  -> textureGather()  // 4-component gather
TLDS  -> textureLod()     // Sample with LOD
TXQ   -> textureSize()    // Query texture size

// Compute shader features
Shared Memory     -> shared variables
Barriers          -> barrier()
Atomic Operations -> atomicAdd/atomicMin/etc
Image Load/Store  -> imageLoad/imageStore

// Vertex input mapping
Attribute 0-15    -> in vec4 attr0..attr15
Built-ins:
  VertexId        -> gl_VertexID
  InstanceId      -> gl_InstanceID
  Position        -> gl_Position (output)

Shader Specialization

Shaders are specialized based on render state:

public struct ShaderSpecializationState
{
    // Transform feedback
    public bool TransformFeedbackEnabled;
    public uint TransformFeedbackBufferMask;
    
    // Graphics state
    public bool AlphaTestEnabled;
    public CompareOp AlphaTestFunc;
    public float AlphaTestRef;
    
    // Texture state
    public bool TextureSrgb[32];
    public SamplerType TextureType[32];
    
    // Compute hash for cache lookup
    public Hash128 GetHash() { /* ... */ }
}

Specialization allows aggressive optimization by baking constants into shader code

Texture Management

Texture Pool

Textures are managed in pools:

public class TexturePool
{
    private readonly GpuContext _context;
    private readonly GpuChannel _channel;
    
    private readonly Texture[] _textures;
    private readonly ulong _address;
    private readonly int _maximumId;
    
    public Texture Get(int id)
    {
        if (_textures[id] == null)
        {
            // Load texture descriptor from GPU memory
            TextureDescriptor descriptor = ReadDescriptor(id);
            
            // Create or find existing texture
            _textures[id] = _context.Methods.TextureManager.FindOrCreate(descriptor);
        }
        
        return _textures[id];
    }
}

Texture Descriptor

Switch uses descriptors to define texture properties:

public struct TextureDescriptor
{
    public ulong Address;           // GPU virtual address
    public Format Format;           // Pixel format
    public TextureTarget Target;    // 1D/2D/3D/Cube/Array
    public int Width;
    public int Height;
    public int Depth;
    public int Levels;              // Mipmap levels
    public int Layers;              // Array layers
    public SwizzleComponent[] Swizzle; // Channel swizzle
    public bool IsSrgb;
    public TileMode TileMode;       // Linear/block-linear
}

Texture Formats

Ryujinx supports extensive format conversions:

Color Formats
Depth Formats
Special Formats

R8G8B8A8_UNORM
R8G8B8A8_SRGB
R16G16B16A16_FLOAT
B5G6R5_UNORM
BC1_UNORM (DXT1)
BC2_UNORM (DXT3)  
BC3_UNORM (DXT5)
BC4_UNORM
BC5_UNORM
BC6H_SFLOAT
BC7_UNORM
ASTC_4x4_UNORM
ASTC_8x8_SRGB

D16_UNORM
D24_UNORM_S8_UINT
D32_FLOAT
D32_FLOAT_S8_UINT
S8_UINT

R11G11B10_FLOAT
R10G10B10A2_UNORM
E5B9G9R9_FLOAT (Shared exponent)

Block Linear Swizzling

Switch uses block-linear texture layout for cache efficiency:

public static class LayoutConverter
{
    // Convert block-linear to linear
    public static void ConvertBlockLinearToLinear(
        Span<byte> dst,
        ReadOnlySpan<byte> src,
        int width,
        int height,
        int depth,
        int levels,
        int layers,
        int blockWidth,
        int blockHeight,
        int bytesPerPixel,
        int gobBlocksInY,
        int gobBlocksInZ,
        int gobBlocksInTileX)
    {
        // Deswizzle using GOB (Group of Bytes) layout
        // GOB size: 64 bytes (16 pixels x 4 bytes)
        // Organized in 2D/3D blocks for cache locality
    }
}

Block-linear benefits:

Improved texture cache hit rate
Better memory access patterns
Efficient mipmap storage

Buffer Management

Buffer Cache

public class BufferManager
{
    private readonly Dictionary<ulong, Buffer> _buffers;
    private readonly RangeList<Buffer> _bufferOverlaps;
    
    public BufferRange GetBuffer(ulong address, ulong size, bool write)
    {
        // Check for existing overlapping buffers
        Buffer buffer = FindOverlap(address, size);
        
        if (buffer == null)
        {
            // Create new buffer
            buffer = CreateBuffer(address, size);
            _buffers[address] = buffer;
        }
        
        // Synchronize if written by GPU
        if (write && buffer.GpuModified)
        {
            buffer.SynchronizeMemory();
        }
        
        return new BufferRange(buffer, address - buffer.Address, size);
    }
}

Buffer Types

Vertex Buffers

// Vertex attribute data
- Position, normal, texcoord
- Instanced attributes  
- Stride and offset

Index Buffers

// Triangle indices
- U8/U16/U32 formats
- Primitive restart

Uniform Buffers

// Shader constants
- Per-draw parameters
- Material properties
- Transform matrices

Storage Buffers

// Read-write access
- Compute shader data
- Large structured buffers

Synchronization

GPU-CPU Sync

Handling synchronization between CPU and GPU:

public class SynchronizationManager
{
    // CPU waits for GPU
    public void WaitForFence(ulong fenceValue)
    {
        // Block CPU thread until GPU reaches fence point
        while (GetCompletedValue() < fenceValue)
        {
            Thread.Yield();
        }
    }
    
    // GPU signals fence
    public void SignalFence(ulong fenceValue)
    {
        // Queue fence signal in command stream
        _renderer.SetFence(fenceValue);
    }
    
    // Query completed work
    public ulong GetCompletedValue()
    {
        return _renderer.GetCompletedFenceValue();
    }
}

Sync Actions

Actions triggered at synchronization points:

public interface ISyncActionHandler
{
    void SyncPreActions(bool syncpoint);
    void SyncPostAction();
}

// Example: Buffer flush on sync
public class BufferSyncAction : ISyncActionHandler
{
    private readonly Buffer _buffer;
    
    public void SyncPreActions(bool syncpoint)
    {
        // Flush buffer to host before sync
        _buffer.Flush();
    }
    
    public void SyncPostAction()
    {
        // Update after GPU completion
        _buffer.InvalidateRange();
    }
}

Graphics Abstraction Layer (GAL)

Unified interface for rendering backends:

public interface IRenderer : IDisposable
{
    // Pipeline creation
    IProgram CreateProgram(ShaderSource[] shaders, ShaderInfo info);
    IBuffer CreateBuffer(int size, BufferAccess access);
    ITexture CreateTexture(TextureCreateInfo info);
    ISampler CreateSampler(SamplerCreateInfo info);
    
    // State management  
    void SetRenderTargets(ITexture[] colors, ITexture depthStencil);
    void SetViewports(ReadOnlySpan<Viewport> viewports);
    void SetPipeline(IPipeline pipeline);
    
    // Drawing
    void Draw(int vertexCount, int instanceCount, int firstVertex, int firstInstance);
    void DrawIndexed(int indexCount, int instanceCount, int firstIndex, 
                     int baseVertex, int firstInstance);
    
    // Compute
    void DispatchCompute(int groupsX, int groupsY, int groupsZ);
    
    // Synchronization
    void SetFence(ulong value);
    ulong GetCompletedFenceValue();
    
    // Present
    void Present(ITexture texture);
}

Backend Implementations

Vulkan
OpenGL

Advantages:

Lower CPU overhead
Better multi-threading
Advanced features (descriptor indexing, dynamic rendering)
Preferred on Windows/Linux

Key features:

- Pipeline state objects
- Command buffer recording
- Memory management (VMA)
- Synchronization primitives

Advantages:

Wider compatibility
Simpler debugging
Fallback for older hardware

Key features:

- Direct state access (DSA)
- Persistent mapped buffers
- Compute shaders
- Compatibility context support

Presentation & Display

Window management and frame presentation:

public class Window
{
    private readonly GpuContext _context;
    private ITexture[] _presentableTextures;
    
    public void Present(ITexture texture, ImageCrop crop, Action swapBuffersCallback)
    {
        // Apply any post-processing
        ITexture output = ApplyPostProcessing(texture);
        
        // Present to window
        _context.Renderer.Present(output, crop);
        
        // Swap buffers
        swapBuffersCallback?.Invoke();
    }
    
    public void SetSize(int width, int height, bool resizable)
    {
        // Resize render targets
        RecreateSwapchain(width, height);
    }
}

Performance Optimizations

Shader Caching

Disk cache for compiled shaders
Reduces stuttering on repeated execution
Per-game cache with version tracking

Buffer Coalescing

Merge small buffers into larger allocations
Reduces bind overhead
Improves memory locality

Lazy State Updates

Only update changed state
Batch state changes
Shadow state comparison

Async Compilation

Compile shaders on background threads
Display loading indicator
Minimal impact on frame time

Debugging Tools

Shader Dumps

Export guest and translated shaders:

// Enable via configuration
shaderDumpPath = "./shader_dumps/"

Frame Capture

RenderDoc integration:

Capture frame commands
Inspect state
Debug shaders

Command Logging

Log GPU commands:

LogLevel = LogLevel.Trace

Outputs method calls and arguments

Resource Tracking

Monitor GPU memory:

Texture usage
Buffer allocations
Cache statistics

HLE Services

NVDRV service and ioctl interface

Memory Management

GPU memory mapping and MMU

Performance Tuning

Graphics optimization settings

Troubleshooting

Resolving graphics issues

Source Code Reference

src/Ryujinx.Graphics.Gpu/GpuContext.cs:19 - GPU context
src/Ryujinx.Graphics.Gpu/Engine/GPFifo/GPFifoDevice.cs - Command processor
src/Ryujinx.Graphics.Gpu/Engine/Threed/ - 3D engine
src/Ryujinx.Graphics.Gpu/Shader/ShaderCache.cs:22 - Shader cache
src/Ryujinx.Graphics.GAL/ - Graphics abstraction layer
src/Ryujinx.Graphics.Vulkan/ - Vulkan backend
src/Ryujinx.Graphics.OpenGL/ - OpenGL backend

Overview

Core Components

Graphics

​Overview

​GPU Context

​Command Processing

​GPFifo (General Purpose FIFO)

​Command Buffer Format

​Graphics Engine (3D)

​Render State Management

​Shader Translation

​Shader Cache

​Translation Pipeline

​Guest-to-Host Feature Mapping

​Shader Specialization

​Texture Management

​Texture Pool

​Texture Descriptor

​Texture Formats

​Block Linear Swizzling

​Buffer Management

​Buffer Cache

​Buffer Types

Vertex Buffers

Index Buffers

Uniform Buffers

Storage Buffers

​Synchronization

​GPU-CPU Sync

​Sync Actions

​Graphics Abstraction Layer (GAL)

​Backend Implementations

​Presentation & Display

​Performance Optimizations

​Debugging Tools

Shader Dumps

Frame Capture

Command Logging

Resource Tracking

​Related Topics

HLE Services

Memory Management

Performance Tuning

Troubleshooting

​Source Code Reference

Build docs developers (and LLMs) love

Overview

GPU Context

Command Processing

GPFifo (General Purpose FIFO)

Command Buffer Format

Graphics Engine (3D)

Render State Management

Shader Translation

Shader Cache

Translation Pipeline

Guest-to-Host Feature Mapping

Shader Specialization

Texture Management

Texture Pool

Texture Descriptor

Texture Formats

Block Linear Swizzling

Buffer Management

Buffer Cache

Buffer Types

Synchronization

GPU-CPU Sync

Sync Actions

Graphics Abstraction Layer (GAL)

Backend Implementations

Presentation & Display

Performance Optimizations

Debugging Tools

Related Topics

Source Code Reference