Skip to main content

Overview

Ryujinx’s graphics subsystem emulates the NVIDIA Tegra X1 GPU, translating Switch graphics commands to host APIs (Vulkan, OpenGL). The architecture is layered:
GAL (Graphics Abstraction Layer) provides a unified API for multiple rendering backends

GPU Context

The central GPU emulation context from src/Ryujinx.Graphics.Gpu/GpuContext.cs:
public sealed class GpuContext : IDisposable
{
    /// <summary>
    /// Host renderer (Vulkan/OpenGL backend)
    /// </summary>
    public IRenderer Renderer { get; }
    
    /// <summary>
    /// GPU General Purpose FIFO queue
    /// </summary>
    public GPFifoDevice GPFifo { get; }
    
    /// <summary>
    /// GPU synchronization manager
    /// </summary>
    public SynchronizationManager Synchronization { get; }
    
    /// <summary>
    /// Presentation window
    /// </summary>
    public Window Window { get; }
    
    /// <summary>
    /// Sequence number for resource versioning
    /// </summary>
    internal int SequenceNumber { get; private set; }
    
    /// <summary>
    /// Registry of physical memories by process ID
    /// </summary>
    internal ConcurrentDictionary<ulong, PhysicalMemory> PhysicalMemoryRegistry { get; }
    
    /// <summary>
    /// Actions to execute on GPU sync points
    /// </summary>
    internal List<ISyncActionHandler> SyncActions { get; }
    internal List<ISyncActionHandler> SyncpointActions { get; }
    
    public GpuContext(IRenderer renderer, DirtyHacks hacks)
    {
        Renderer = renderer;
        GPFifo = new GPFifoDevice(this);
        Synchronization = new SynchronizationManager();
        Window = new Window(this);
        
        PhysicalMemoryRegistry = new ConcurrentDictionary<ulong, PhysicalMemory>();
        SyncActions = [];
        SyncpointActions = [];
    }
}
Key responsibilities:
  • Command buffer submission and processing
  • Memory management (texture, buffer, shader storage)
  • Synchronization with CPU
  • Multi-process GPU sharing

Command Processing

GPFifo (General Purpose FIFO)

The command submission interface:
public class GPFifoDevice
{
    private readonly GpuContext _context;
    private readonly GPFifoProcessor _processor;
    
    // Command buffer submission
    public void Submit(ReadOnlySpan<ulong> entries)
    {
        foreach (ulong entry in entries)
        {
            ulong gpuVa = entry & 0xFFFFFFFFFF;
            int size = (int)((entry >> 40) & 0x1FFFFF);
            
            ProcessCommands(gpuVa, size);
        }
    }
    
    private void ProcessCommands(ulong gpuVa, int size)
    {
        // Read commands from GPU virtual address
        ReadOnlySpan<int> commands = _context.MemoryManager.Read<int>(gpuVa, size);
        
        // Process command buffer
        _processor.Process(commands);
    }
}

Command Buffer Format

Switch uses pushbuffer-style command submission:
// Command format: [Method | Argument]
// Method: bits 0-12 (8192 methods)
// SubChannel: bits 13-15 (8 subchannels)
// ArgumentCount: bits 16-28
// Type: bits 29-31

public void ProcessMethod(int method, int argument, int subChannel)
{
    switch (subChannel)
    {
        case 0: // 2D Engine
            _engine2d.ProcessMethod(method, argument);
            break;
        case 1: // 3D Engine  
            _engine3d.ProcessMethod(method, argument);
            break;
        case 2: // Compute
            _engineCompute.ProcessMethod(method, argument);
            break;
        case 3: // Inline-to-Memory
            _inlineToMemory.ProcessMethod(method, argument);
            break;
        case 4: // DMA Copy
            _dma.ProcessMethod(method, argument);
            break;
    }
}
Handles 2D blits and surface operations:
- Surface copies
- Format conversions
- Rectangular fills
- Block linear <-> linear conversions

Graphics Engine (3D)

The main rendering engine processes graphics state and draw commands:
public class ThreedClass : IDeviceState
{
    private readonly GpuContext _context;
    private readonly GpuChannel _channel;
    private readonly DeviceStateWithShadow<ThreedClassState> _state;
    
    // State management
    public void SetRenderTargets(int count, RenderTarget[] targets) { /* ... */ }
    public void SetViewports(ReadOnlySpan<Viewport> viewports) { /* ... */ }
    public void SetScissors(ReadOnlySpan<Rectangle<int>> scissors) { /* ... */ }
    public void SetVertexBuffers(ReadOnlySpan<VertexBufferState> buffers) { /* ... */ }
    
    // Shader binding
    public void SetShaderStage(ShaderStage stage, ulong gpuVa, bool enable) { /* ... */ }
    
    // Draw commands
    public void Draw(int vertexCount, int instanceCount, int firstVertex, 
                     int firstInstance) { /* ... */ }
    public void DrawIndexed(int indexCount, int instanceCount, int firstIndex,
                           int baseVertex, int firstInstance) { /* ... */ }
}

Render State Management

Graphics state is cached and synchronized:
public class DeviceStateWithShadow<T> where T : unmanaged
{
    private T _hostState;      // Current host state
    private T _shadowState;    // Last written state
    
    public void Update()
    {
        // Compare shadow vs host state
        // Only update changed fields
        if (!MemoryEquals(_hostState, _shadowState))
        {
            UpdateDirtyFields();
            _shadowState = _hostState;
        }
    }
}
State groups:
  • Rasterizer: Primitive topology, polygon mode, cull mode, front face
  • Depth/Stencil: Depth test, stencil test, depth bounds
  • Blend: Blend equations, blend factors, color mask
  • Viewport: Viewport transforms, depth range
  • Scissor: Scissor rectangles

Shader Translation

Shaders are translated from Switch binary format to host GLSL/SPIR-V:

Shader Cache

From src/Ryujinx.Graphics.Gpu/Shader/ShaderCache.cs:
class ShaderCache : IDisposable
{
    private readonly GpuContext _context;
    private readonly ShaderDumper _dumper;
    
    // Cached programs
    private readonly Dictionary<ulong, CachedShaderProgram> _cpPrograms;  // Compute
    private readonly Dictionary<ShaderAddresses, CachedShaderProgram> _gpPrograms; // Graphics
    
    // Disk cache
    private readonly ComputeShaderCacheHashTable _computeShaderCache;
    private readonly ShaderCacheHashTable _graphicsShaderCache;
    private readonly DiskCacheHostStorage _diskCacheHostStorage;
    
    public CachedShaderProgram GetGraphicsShader(ShaderAddresses addresses)
    {
        if (_gpPrograms.TryGetValue(addresses, out var program))
        {
            return program;
        }
        
        // Not cached - translate from binary
        program = TranslateGraphicsShader(addresses);
        _gpPrograms[addresses] = program;
        
        return program;
    }
}

Translation Pipeline

1

Binary Decode

Decode Maxwell/Pascal GPU binary instructions:
// Read shader code from GPU memory
ReadOnlySpan<byte> code = memoryManager.GetSpan(gpuVa, maxSize);

// Decode instruction stream
ShaderDecoder decoder = new ShaderDecoder(code);
List<Block> blocks = decoder.Decode();
2

Control Flow Analysis

Build control flow graph:
// Identify basic blocks
// Resolve branch targets
// Build dominator tree
ControlFlowGraph cfg = ControlFlowAnalysis.Build(blocks);
3

Translation to IR

Convert to intermediate representation:
// Lift GPU instructions to IR
// Track register/attribute usage
// Handle texture operations
// Process special functions
4

Optimization

Optimize shader IR:
// Dead code elimination
// Constant propagation
// Algebraic simplification
// Resource optimization
5

Code Generation

Generate target shader language:
if (backend == Backend.Vulkan)
{
    // Generate SPIR-V
    byte[] spirv = CodeGen.Spirv.Generate(program);
    return renderer.CompileShader(spirv);
}
else
{
    // Generate GLSL
    string glsl = CodeGen.Glsl.Generate(program);
    return renderer.CompileShader(glsl);
}

Guest-to-Host Feature Mapping

// Switch texture instructions -> Host equivalents
TEX   -> texture()        // Basic sampling
TEXS  -> textureGather()  // Gather operation  
TLD   -> texelFetch()     // Load without filtering
TLD4  -> textureGather()  // 4-component gather
TLDS  -> textureLod()     // Sample with LOD
TXQ   -> textureSize()    // Query texture size

Shader Specialization

Shaders are specialized based on render state:
public struct ShaderSpecializationState
{
    // Transform feedback
    public bool TransformFeedbackEnabled;
    public uint TransformFeedbackBufferMask;
    
    // Graphics state
    public bool AlphaTestEnabled;
    public CompareOp AlphaTestFunc;
    public float AlphaTestRef;
    
    // Texture state
    public bool TextureSrgb[32];
    public SamplerType TextureType[32];
    
    // Compute hash for cache lookup
    public Hash128 GetHash() { /* ... */ }
}
Specialization allows aggressive optimization by baking constants into shader code

Texture Management

Texture Pool

Textures are managed in pools:
public class TexturePool
{
    private readonly GpuContext _context;
    private readonly GpuChannel _channel;
    
    private readonly Texture[] _textures;
    private readonly ulong _address;
    private readonly int _maximumId;
    
    public Texture Get(int id)
    {
        if (_textures[id] == null)
        {
            // Load texture descriptor from GPU memory
            TextureDescriptor descriptor = ReadDescriptor(id);
            
            // Create or find existing texture
            _textures[id] = _context.Methods.TextureManager.FindOrCreate(descriptor);
        }
        
        return _textures[id];
    }
}

Texture Descriptor

Switch uses descriptors to define texture properties:
public struct TextureDescriptor
{
    public ulong Address;           // GPU virtual address
    public Format Format;           // Pixel format
    public TextureTarget Target;    // 1D/2D/3D/Cube/Array
    public int Width;
    public int Height;
    public int Depth;
    public int Levels;              // Mipmap levels
    public int Layers;              // Array layers
    public SwizzleComponent[] Swizzle; // Channel swizzle
    public bool IsSrgb;
    public TileMode TileMode;       // Linear/block-linear
}

Texture Formats

Ryujinx supports extensive format conversions:
R8G8B8A8_UNORM
R8G8B8A8_SRGB
R16G16B16A16_FLOAT
B5G6R5_UNORM
BC1_UNORM (DXT1)
BC2_UNORM (DXT3)  
BC3_UNORM (DXT5)
BC4_UNORM
BC5_UNORM
BC6H_SFLOAT
BC7_UNORM
ASTC_4x4_UNORM
ASTC_8x8_SRGB

Block Linear Swizzling

Switch uses block-linear texture layout for cache efficiency:
public static class LayoutConverter
{
    // Convert block-linear to linear
    public static void ConvertBlockLinearToLinear(
        Span<byte> dst,
        ReadOnlySpan<byte> src,
        int width,
        int height,
        int depth,
        int levels,
        int layers,
        int blockWidth,
        int blockHeight,
        int bytesPerPixel,
        int gobBlocksInY,
        int gobBlocksInZ,
        int gobBlocksInTileX)
    {
        // Deswizzle using GOB (Group of Bytes) layout
        // GOB size: 64 bytes (16 pixels x 4 bytes)
        // Organized in 2D/3D blocks for cache locality
    }
}
Block-linear benefits:
  • Improved texture cache hit rate
  • Better memory access patterns
  • Efficient mipmap storage

Buffer Management

Buffer Cache

public class BufferManager
{
    private readonly Dictionary<ulong, Buffer> _buffers;
    private readonly RangeList<Buffer> _bufferOverlaps;
    
    public BufferRange GetBuffer(ulong address, ulong size, bool write)
    {
        // Check for existing overlapping buffers
        Buffer buffer = FindOverlap(address, size);
        
        if (buffer == null)
        {
            // Create new buffer
            buffer = CreateBuffer(address, size);
            _buffers[address] = buffer;
        }
        
        // Synchronize if written by GPU
        if (write && buffer.GpuModified)
        {
            buffer.SynchronizeMemory();
        }
        
        return new BufferRange(buffer, address - buffer.Address, size);
    }
}

Buffer Types

Vertex Buffers

// Vertex attribute data
- Position, normal, texcoord
- Instanced attributes  
- Stride and offset

Index Buffers

// Triangle indices
- U8/U16/U32 formats
- Primitive restart

Uniform Buffers

// Shader constants
- Per-draw parameters
- Material properties
- Transform matrices

Storage Buffers

// Read-write access
- Compute shader data
- Large structured buffers

Synchronization

GPU-CPU Sync

Handling synchronization between CPU and GPU:
public class SynchronizationManager
{
    // CPU waits for GPU
    public void WaitForFence(ulong fenceValue)
    {
        // Block CPU thread until GPU reaches fence point
        while (GetCompletedValue() < fenceValue)
        {
            Thread.Yield();
        }
    }
    
    // GPU signals fence
    public void SignalFence(ulong fenceValue)
    {
        // Queue fence signal in command stream
        _renderer.SetFence(fenceValue);
    }
    
    // Query completed work
    public ulong GetCompletedValue()
    {
        return _renderer.GetCompletedFenceValue();
    }
}

Sync Actions

Actions triggered at synchronization points:
public interface ISyncActionHandler
{
    void SyncPreActions(bool syncpoint);
    void SyncPostAction();
}

// Example: Buffer flush on sync
public class BufferSyncAction : ISyncActionHandler
{
    private readonly Buffer _buffer;
    
    public void SyncPreActions(bool syncpoint)
    {
        // Flush buffer to host before sync
        _buffer.Flush();
    }
    
    public void SyncPostAction()
    {
        // Update after GPU completion
        _buffer.InvalidateRange();
    }
}

Graphics Abstraction Layer (GAL)

Unified interface for rendering backends:
public interface IRenderer : IDisposable
{
    // Pipeline creation
    IProgram CreateProgram(ShaderSource[] shaders, ShaderInfo info);
    IBuffer CreateBuffer(int size, BufferAccess access);
    ITexture CreateTexture(TextureCreateInfo info);
    ISampler CreateSampler(SamplerCreateInfo info);
    
    // State management  
    void SetRenderTargets(ITexture[] colors, ITexture depthStencil);
    void SetViewports(ReadOnlySpan<Viewport> viewports);
    void SetPipeline(IPipeline pipeline);
    
    // Drawing
    void Draw(int vertexCount, int instanceCount, int firstVertex, int firstInstance);
    void DrawIndexed(int indexCount, int instanceCount, int firstIndex, 
                     int baseVertex, int firstInstance);
    
    // Compute
    void DispatchCompute(int groupsX, int groupsY, int groupsZ);
    
    // Synchronization
    void SetFence(ulong value);
    ulong GetCompletedFenceValue();
    
    // Present
    void Present(ITexture texture);
}

Backend Implementations

Advantages:
  • Lower CPU overhead
  • Better multi-threading
  • Advanced features (descriptor indexing, dynamic rendering)
  • Preferred on Windows/Linux
Key features:
- Pipeline state objects
- Command buffer recording
- Memory management (VMA)
- Synchronization primitives

Presentation & Display

Window management and frame presentation:
public class Window
{
    private readonly GpuContext _context;
    private ITexture[] _presentableTextures;
    
    public void Present(ITexture texture, ImageCrop crop, Action swapBuffersCallback)
    {
        // Apply any post-processing
        ITexture output = ApplyPostProcessing(texture);
        
        // Present to window
        _context.Renderer.Present(output, crop);
        
        // Swap buffers
        swapBuffersCallback?.Invoke();
    }
    
    public void SetSize(int width, int height, bool resizable)
    {
        // Resize render targets
        RecreateSwapchain(width, height);
    }
}

Performance Optimizations

  • Disk cache for compiled shaders
  • Reduces stuttering on repeated execution
  • Per-game cache with version tracking
  • Merge small buffers into larger allocations
  • Reduces bind overhead
  • Improves memory locality
  • Only update changed state
  • Batch state changes
  • Shadow state comparison
  • Compile shaders on background threads
  • Display loading indicator
  • Minimal impact on frame time

Debugging Tools

Shader Dumps

Export guest and translated shaders:
// Enable via configuration
shaderDumpPath = "./shader_dumps/"

Frame Capture

RenderDoc integration:
  • Capture frame commands
  • Inspect state
  • Debug shaders

Command Logging

Log GPU commands:
LogLevel = LogLevel.Trace
Outputs method calls and arguments

Resource Tracking

Monitor GPU memory:
  • Texture usage
  • Buffer allocations
  • Cache statistics

HLE Services

NVDRV service and ioctl interface

Memory Management

GPU memory mapping and MMU

Performance Tuning

Graphics optimization settings

Troubleshooting

Resolving graphics issues

Source Code Reference

  • src/Ryujinx.Graphics.Gpu/GpuContext.cs:19 - GPU context
  • src/Ryujinx.Graphics.Gpu/Engine/GPFifo/GPFifoDevice.cs - Command processor
  • src/Ryujinx.Graphics.Gpu/Engine/Threed/ - 3D engine
  • src/Ryujinx.Graphics.Gpu/Shader/ShaderCache.cs:22 - Shader cache
  • src/Ryujinx.Graphics.GAL/ - Graphics abstraction layer
  • src/Ryujinx.Graphics.Vulkan/ - Vulkan backend
  • src/Ryujinx.Graphics.OpenGL/ - OpenGL backend

Build docs developers (and LLMs) love