GPU Emulation

Overview

Ryujinx emulates the NVIDIA Tegra X1’s Maxwell GPU architecture, providing a complete software implementation of the graphics processing unit used in the Nintendo Switch. The GPU emulation layer sits between the guest application and the host graphics API (OpenGL or Vulkan).

GpuContext

Central GPU emulation context managing all GPU resources and state

GpuChannel

Individual GPU command submission channels with isolated state

GPFifo

General Purpose FIFO for command buffer submission and processing

Engine Classes

Specialized engines for 3D, 2D, compute, and DMA operations

GpuContext

The GpuContext class is the central hub of GPU emulation, coordinating all GPU operations and managing shared resources.

Architecture

namespace Ryujinx.Graphics.Gpu
{
    public sealed class GpuContext : IDisposable
    {
        public IRenderer Renderer { get; }           // Host renderer (OpenGL/Vulkan)
        public GPFifoDevice GPFifo { get; }          // Command submission device
        public SynchronizationManager Synchronization { get; }
        public Window Window { get; }                // Presentation window
        
        internal int SequenceNumber { get; private set; }
        internal ulong SyncNumber { get; private set; }
    }
}

The GPU context uses a Maxwell timer frequency of 614.4 MHz (384/625 of nanoseconds) to accurately emulate GPU timing behavior.

Key Responsibilities

Memory Management

Manages physical memory registries per process (keyed by process ID)
Supports multiple PhysicalMemory instances for multi-process emulation
Handles CPU virtual memory tracking and GPU memory mapping
Provides MemoryManager creation for GPU virtual address spaces

// Register a process's memory for GPU access
public void RegisterProcess(ulong pid, IVirtualMemoryManagerTracked cpuMemory)
{
    PhysicalMemory physicalMemory = new(this, cpuMemory);
    PhysicalMemoryRegistry.TryAdd(pid, physicalMemory);
}

Channel Management

Creates and manages GpuChannel instances
Each channel represents an independent command submission context
Channels can be bound to different memory managers

public GpuChannel CreateChannel()
{
    return new GpuChannel(this);
}

Synchronization

Tracks sequence numbers for resource modification ordering
Manages sync actions triggered by CPU-GPU synchronization points
Handles buffer migrations between memory regions
Creates host sync objects for GPU-CPU coordination

internal void CreateHostSyncIfNeeded(HostSyncFlags flags)
{
    // Creates fence/sync primitives when:
    // - Buffer migrations are pending
    // - Sync actions are registered
    // - Syncpoint increments occur
    Renderer.CreateSync(SyncNumber, strict: flags.HasFlag(HostSyncFlags.Strict));
    SyncNumber++;
}

Shader Cache

Coordinates shader cache initialization across all processes
Propagates shader cache state changes to the host application
Manages disk cache for persistent shader storage

public void InitializeShaderCache(CancellationToken cancellationToken)
{
    HostInitalized.WaitOne();
    foreach (PhysicalMemory physicalMemory in PhysicalMemoryRegistry.Values)
    {
        physicalMemory.ShaderCache.Initialize(cancellationToken);
    }
}

GPU Timer

The emulated GPU provides accurate timing using Maxwell’s timer frequency:

// Convert nanoseconds to Maxwell GPU ticks (614.4 MHz)
private static ulong ConvertNanosecondsToTicks(ulong nanoseconds)
{
    const int NsToTicksFractionNumerator = 384;
    const int NsToTicksFractionDenominator = 625;
    
    ulong divided = nanoseconds / NsToTicksFractionDenominator;
    ulong rounded = divided * NsToTicksFractionDenominator;
    ulong errorBias = (nanoseconds - rounded) * NsToTicksFractionNumerator / NsToTicksFractionDenominator;
    
    return divided * NsToTicksFractionNumerator + errorBias;
}

The FastGpuTime configuration option can divide the reported time by 256 to prevent games from reducing resolution due to perceived slow performance.

GpuChannel

Each GpuChannel represents an independent command submission context with its own state and resource bindings.

Channel Architecture

Components

public class GpuChannel : IDisposable
{
    internal BufferManager BufferManager { get; }     // Buffer resource management
    internal TextureManager TextureManager { get; }   // Texture resource management
    internal MemoryManager MemoryManager { get; }     // GPU virtual memory
    
    // Bind a memory manager to this channel
    public void BindMemory(MemoryManager memoryManager)
    {
        memoryManager.Physical.BufferCache.NotifyBuffersModified += BufferManager.Rebind;
        memoryManager.MemoryUnmapped += MemoryUnmappedHandler;
        TextureManager.ReloadPools();
    }
}

When a channel’s memory manager changes, all texture pools must be reloaded and buffer caches must be pruned to avoid stale references.

GPFifo Command Processing

The General Purpose FIFO (GPFifo) is the primary mechanism for submitting commands to the GPU. It processes command buffers containing method calls to various engine classes.

Command Buffer Format

Commands use a compressed format with multiple encoding schemes:

struct CompressedMethod
{
    int MethodAddress;        // Target method offset
    int MethodSubchannel;     // Engine subchannel (0-4)
    int MethodCount;          // Number of arguments
    SecOp SecOp;             // Encoding type
    int ImmdData;            // Immediate data (for ImmdDataMethod)
}

enum SecOp
{
    IncMethod,       // Increment method address after each argument
    NonIncMethod,    // Keep method address constant (array data)
    OneInc,          // Increment once then keep constant
    ImmdDataMethod   // Single method call with immediate data
}

Processing Pipeline

Command Decode

Commands are decoded from the GPFIFO stream, extracting method address, subchannel, and arguments.

public void Process(ulong baseGpuVa, ReadOnlySpan<int> commandBuffer)
{
    for (int index = 0; index < commandBuffer.Length; index++)
    {
        int command = commandBuffer[index];
        
        if (_state.MethodCount != 0)
        {
            // Process method argument
            Send(gpuVa, _state.Method, command, _state.SubChannel, isLastCall);
        }
        else
        {
            // Decode new method header
            CompressedMethod meth = Unsafe.As<int, CompressedMethod>(ref command);
            // ...
        }
    }
}

Fast Path Optimization

Common operations are optimized with fast paths:

Inline-to-Memory uploads: Batch copy data directly to GPU memory
Uniform buffer updates: Bulk constant buffer data transfers

private bool TryFastUniformBufferUpdate(CompressedMethod meth, ReadOnlySpan<int> commandBuffer)
{
    if (meth.MethodAddress == UniformBufferUpdateDataMethodOffset &&
        meth.SecOp == SecOp.NonIncMethod)
    {
        _3dClass.ConstantBufferUpdate(commandBuffer.Slice(offset + 1, meth.MethodCount));
        return true;
    }
    return false;
}

Engine Dispatch

Methods are routed to the appropriate engine class based on subchannel:

Subchannel 0: 3D Engine (ThreedClass)
Subchannel 1: Compute Engine (ComputeClass)
Subchannel 2: Inline-to-Memory (I2M)
Subchannel 3: 2D Engine (TwodClass)
Subchannel 4: DMA Engine (DmaClass)

Macro Execution

Methods in the range 0xE00+ trigger Macro Method Expansion (MME) execution:

if (offset >= 0xe00)
{
    int macroIndex = (offset >> 1) & MacroIndexMask;
    
    if ((offset & 1) != 0)
        _fifoClass.MmePushArgument(macroIndex, gpuVa, argument);
    else
        _fifoClass.MmeStart(macroIndex, argument);
        
    if (isLastCall)
        _fifoClass.CallMme(macroIndex, state);
}

Engine Classes

Ryujinx implements multiple GPU engine classes that handle different types of operations.

ThreedClass (3D Engine)

The primary graphics engine handling 3D rendering operations.

Overview
State Management
Draw Operations

Located in src/Ryujinx.Graphics.Gpu/Engine/Threed/, the 3D engine manages:

Vertex and index buffer bindings
Render target configuration
Pipeline state (blend, depth, stencil, rasterizer)
Shader program binding
Draw calls (arrays, indexed, instanced, indirect)
Transform feedback
Conditional rendering

class ThreedClass : IDeviceState
{
    private readonly DrawManager _drawManager;
    private readonly StateUpdater _stateUpdater;
    private readonly ConstantBufferUpdater _cbUpdater;
    private readonly SemaphoreUpdater _semaphoreUpdater;
}

State is tracked in ThreedClassState with shadow RAM support:

public void Write(int offset, int data)
{
    _state.WriteWithRedundancyCheck(offset, data, out bool valueChanged);
    
    if (valueChanged)
    {
        _stateUpdater.SetDirty(offset);
    }
}

Only changed state is propagated to the host API, minimizing overhead.

The DrawManager coordinates all rendering operations:

Validates state before draws
Updates vertex/index buffers
Manages draw call batching
Handles deferred draws for optimization

Key draw methods:

DrawVertexArrayBeginEndInstanceFirst/Subsequent
DrawIndexBuffer{8,16,32}BeginEndInstance{First,Subsequent}
DrawEnd - Finalizes the current draw operation

ComputeClass

Handles compute shader dispatch operations:

class ComputeClass
{
    // Dispatch compute work groups
    private void Dispatch()
    {
        // Update compute state
        UpdateShaderState();
        UpdateStorageBuffers();
        UpdateTextures();
        
        // Execute dispatch
        _context.Renderer.Pipeline.DispatchCompute(
            groupsX: state.DispatchParamsX,
            groupsY: state.DispatchParamsY,
            groupsZ: state.DispatchParamsZ
        );
    }
}

DmaClass

Performs memory-to-memory copy operations:

Linear-to-linear copies
Tiled-to-linear and linear-to-tiled conversions
Pitch linear memory layout handling

TwodClass

2D blit and fill operations for surfaces:

Surface copies with format conversion
Texture blitting
Solid color fills

Memory Management

The GPU memory subsystem provides virtual addressing and resource tracking.

MemoryManager

class MemoryManager
{
    private PhysicalMemory _physical;  // Backing physical memory
    
    // Translate GPU virtual address to physical
    public bool TryGetPhysicalAddress(ulong gpuVa, out ulong physicalAddress)
    {
        // Page table walk
        // Returns physical address or throws on invalid mapping
    }
    
    // Read/write methods for various data types
    public T Read<T>(ulong gpuVa) where T : unmanaged;
    public void Write<T>(ulong gpuVa, T value) where T : unmanaged;
}

Buffer Cache

The BufferCache (in PhysicalMemory) tracks all buffer objects:

Handles overlapping buffer ranges
Manages CPU modification tracking
Performs buffer migrations when needed
Implements copy-on-write semantics

class BufferCache
{
    // Get or create buffer for GPU access
    public MultiRangeBuffer GetBuffer(MultiRange range, bool write)
    {
        // Check cache for existing buffer
        // Create new buffer if needed
        // Track modifications for synchronization
    }
}

Overview

Core Components

Graphics

Overview

GpuContext

GpuChannel

GPFifo

Engine Classes

GpuContext

Architecture

Key Responsibilities

GPU Timer

GpuChannel

Channel Architecture

Components

GPFifo Command Processing

Command Buffer Format

Processing Pipeline

Engine Classes

ThreedClass (3D Engine)

ComputeClass

DmaClass

TwodClass

Memory Management

MemoryManager

Buffer Cache

Command Buffer Flow

Performance Optimizations

State Redundancy Check

Fast Path Uploads

Deferred Actions

Sequence Numbers

References

Source Files

Related Topics

Build docs developers (and LLMs) love

Overview

Core Components

Graphics

​Overview

GpuContext

GpuChannel

GPFifo

Engine Classes

​GpuContext

​Architecture

​Key Responsibilities

​GPU Timer

​GpuChannel

​Channel Architecture

​Components

​GPFifo Command Processing

​Command Buffer Format

​Processing Pipeline

​Engine Classes

​ThreedClass (3D Engine)

​ComputeClass

​DmaClass

​TwodClass

​Memory Management

​MemoryManager

​Buffer Cache

​Command Buffer Flow

​Performance Optimizations

State Redundancy Check

Fast Path Uploads

Deferred Actions

Sequence Numbers

​References

Source Files

Related Topics

Build docs developers (and LLMs) love

Overview

GpuContext

Architecture

Key Responsibilities

GPU Timer

GpuChannel

Channel Architecture

Components

GPFifo Command Processing

Command Buffer Format

Processing Pipeline

Engine Classes

ThreedClass (3D Engine)

ComputeClass

DmaClass

TwodClass

Memory Management

MemoryManager

Buffer Cache

Command Buffer Flow

Performance Optimizations

References