Skip to main content

Overview

Ryujinx emulates the NVIDIA Tegra X1’s Maxwell GPU architecture, providing a complete software implementation of the graphics processing unit used in the Nintendo Switch. The GPU emulation layer sits between the guest application and the host graphics API (OpenGL or Vulkan).

GpuContext

Central GPU emulation context managing all GPU resources and state

GpuChannel

Individual GPU command submission channels with isolated state

GPFifo

General Purpose FIFO for command buffer submission and processing

Engine Classes

Specialized engines for 3D, 2D, compute, and DMA operations

GpuContext

The GpuContext class is the central hub of GPU emulation, coordinating all GPU operations and managing shared resources.

Architecture

namespace Ryujinx.Graphics.Gpu
{
    public sealed class GpuContext : IDisposable
    {
        public IRenderer Renderer { get; }           // Host renderer (OpenGL/Vulkan)
        public GPFifoDevice GPFifo { get; }          // Command submission device
        public SynchronizationManager Synchronization { get; }
        public Window Window { get; }                // Presentation window
        
        internal int SequenceNumber { get; private set; }
        internal ulong SyncNumber { get; private set; }
    }
}
The GPU context uses a Maxwell timer frequency of 614.4 MHz (384/625 of nanoseconds) to accurately emulate GPU timing behavior.

Key Responsibilities

  • Manages physical memory registries per process (keyed by process ID)
  • Supports multiple PhysicalMemory instances for multi-process emulation
  • Handles CPU virtual memory tracking and GPU memory mapping
  • Provides MemoryManager creation for GPU virtual address spaces
// Register a process's memory for GPU access
public void RegisterProcess(ulong pid, IVirtualMemoryManagerTracked cpuMemory)
{
    PhysicalMemory physicalMemory = new(this, cpuMemory);
    PhysicalMemoryRegistry.TryAdd(pid, physicalMemory);
}
  • Creates and manages GpuChannel instances
  • Each channel represents an independent command submission context
  • Channels can be bound to different memory managers
public GpuChannel CreateChannel()
{
    return new GpuChannel(this);
}
  • Tracks sequence numbers for resource modification ordering
  • Manages sync actions triggered by CPU-GPU synchronization points
  • Handles buffer migrations between memory regions
  • Creates host sync objects for GPU-CPU coordination
internal void CreateHostSyncIfNeeded(HostSyncFlags flags)
{
    // Creates fence/sync primitives when:
    // - Buffer migrations are pending
    // - Sync actions are registered
    // - Syncpoint increments occur
    Renderer.CreateSync(SyncNumber, strict: flags.HasFlag(HostSyncFlags.Strict));
    SyncNumber++;
}
  • Coordinates shader cache initialization across all processes
  • Propagates shader cache state changes to the host application
  • Manages disk cache for persistent shader storage
public void InitializeShaderCache(CancellationToken cancellationToken)
{
    HostInitalized.WaitOne();
    foreach (PhysicalMemory physicalMemory in PhysicalMemoryRegistry.Values)
    {
        physicalMemory.ShaderCache.Initialize(cancellationToken);
    }
}

GPU Timer

The emulated GPU provides accurate timing using Maxwell’s timer frequency:
// Convert nanoseconds to Maxwell GPU ticks (614.4 MHz)
private static ulong ConvertNanosecondsToTicks(ulong nanoseconds)
{
    const int NsToTicksFractionNumerator = 384;
    const int NsToTicksFractionDenominator = 625;
    
    ulong divided = nanoseconds / NsToTicksFractionDenominator;
    ulong rounded = divided * NsToTicksFractionDenominator;
    ulong errorBias = (nanoseconds - rounded) * NsToTicksFractionNumerator / NsToTicksFractionDenominator;
    
    return divided * NsToTicksFractionNumerator + errorBias;
}
The FastGpuTime configuration option can divide the reported time by 256 to prevent games from reducing resolution due to perceived slow performance.

GpuChannel

Each GpuChannel represents an independent command submission context with its own state and resource bindings.

Channel Architecture

Components

public class GpuChannel : IDisposable
{
    internal BufferManager BufferManager { get; }     // Buffer resource management
    internal TextureManager TextureManager { get; }   // Texture resource management
    internal MemoryManager MemoryManager { get; }     // GPU virtual memory
    
    // Bind a memory manager to this channel
    public void BindMemory(MemoryManager memoryManager)
    {
        memoryManager.Physical.BufferCache.NotifyBuffersModified += BufferManager.Rebind;
        memoryManager.MemoryUnmapped += MemoryUnmappedHandler;
        TextureManager.ReloadPools();
    }
}
When a channel’s memory manager changes, all texture pools must be reloaded and buffer caches must be pruned to avoid stale references.

GPFifo Command Processing

The General Purpose FIFO (GPFifo) is the primary mechanism for submitting commands to the GPU. It processes command buffers containing method calls to various engine classes.

Command Buffer Format

Commands use a compressed format with multiple encoding schemes:
struct CompressedMethod
{
    int MethodAddress;        // Target method offset
    int MethodSubchannel;     // Engine subchannel (0-4)
    int MethodCount;          // Number of arguments
    SecOp SecOp;             // Encoding type
    int ImmdData;            // Immediate data (for ImmdDataMethod)
}

enum SecOp
{
    IncMethod,       // Increment method address after each argument
    NonIncMethod,    // Keep method address constant (array data)
    OneInc,          // Increment once then keep constant
    ImmdDataMethod   // Single method call with immediate data
}

Processing Pipeline

1

Command Decode

Commands are decoded from the GPFIFO stream, extracting method address, subchannel, and arguments.
public void Process(ulong baseGpuVa, ReadOnlySpan<int> commandBuffer)
{
    for (int index = 0; index < commandBuffer.Length; index++)
    {
        int command = commandBuffer[index];
        
        if (_state.MethodCount != 0)
        {
            // Process method argument
            Send(gpuVa, _state.Method, command, _state.SubChannel, isLastCall);
        }
        else
        {
            // Decode new method header
            CompressedMethod meth = Unsafe.As<int, CompressedMethod>(ref command);
            // ...
        }
    }
}
2

Fast Path Optimization

Common operations are optimized with fast paths:
  • Inline-to-Memory uploads: Batch copy data directly to GPU memory
  • Uniform buffer updates: Bulk constant buffer data transfers
private bool TryFastUniformBufferUpdate(CompressedMethod meth, ReadOnlySpan<int> commandBuffer)
{
    if (meth.MethodAddress == UniformBufferUpdateDataMethodOffset &&
        meth.SecOp == SecOp.NonIncMethod)
    {
        _3dClass.ConstantBufferUpdate(commandBuffer.Slice(offset + 1, meth.MethodCount));
        return true;
    }
    return false;
}
3

Engine Dispatch

Methods are routed to the appropriate engine class based on subchannel:
  • Subchannel 0: 3D Engine (ThreedClass)
  • Subchannel 1: Compute Engine (ComputeClass)
  • Subchannel 2: Inline-to-Memory (I2M)
  • Subchannel 3: 2D Engine (TwodClass)
  • Subchannel 4: DMA Engine (DmaClass)
4

Macro Execution

Methods in the range 0xE00+ trigger Macro Method Expansion (MME) execution:
if (offset >= 0xe00)
{
    int macroIndex = (offset >> 1) & MacroIndexMask;
    
    if ((offset & 1) != 0)
        _fifoClass.MmePushArgument(macroIndex, gpuVa, argument);
    else
        _fifoClass.MmeStart(macroIndex, argument);
        
    if (isLastCall)
        _fifoClass.CallMme(macroIndex, state);
}

Engine Classes

Ryujinx implements multiple GPU engine classes that handle different types of operations.

ThreedClass (3D Engine)

The primary graphics engine handling 3D rendering operations.
Located in src/Ryujinx.Graphics.Gpu/Engine/Threed/, the 3D engine manages:
  • Vertex and index buffer bindings
  • Render target configuration
  • Pipeline state (blend, depth, stencil, rasterizer)
  • Shader program binding
  • Draw calls (arrays, indexed, instanced, indirect)
  • Transform feedback
  • Conditional rendering
class ThreedClass : IDeviceState
{
    private readonly DrawManager _drawManager;
    private readonly StateUpdater _stateUpdater;
    private readonly ConstantBufferUpdater _cbUpdater;
    private readonly SemaphoreUpdater _semaphoreUpdater;
}

ComputeClass

Handles compute shader dispatch operations:
class ComputeClass
{
    // Dispatch compute work groups
    private void Dispatch()
    {
        // Update compute state
        UpdateShaderState();
        UpdateStorageBuffers();
        UpdateTextures();
        
        // Execute dispatch
        _context.Renderer.Pipeline.DispatchCompute(
            groupsX: state.DispatchParamsX,
            groupsY: state.DispatchParamsY,
            groupsZ: state.DispatchParamsZ
        );
    }
}

DmaClass

Performs memory-to-memory copy operations:
  • Linear-to-linear copies
  • Tiled-to-linear and linear-to-tiled conversions
  • Pitch linear memory layout handling

TwodClass

2D blit and fill operations for surfaces:
  • Surface copies with format conversion
  • Texture blitting
  • Solid color fills

Memory Management

The GPU memory subsystem provides virtual addressing and resource tracking.

MemoryManager

class MemoryManager
{
    private PhysicalMemory _physical;  // Backing physical memory
    
    // Translate GPU virtual address to physical
    public bool TryGetPhysicalAddress(ulong gpuVa, out ulong physicalAddress)
    {
        // Page table walk
        // Returns physical address or throws on invalid mapping
    }
    
    // Read/write methods for various data types
    public T Read<T>(ulong gpuVa) where T : unmanaged;
    public void Write<T>(ulong gpuVa, T value) where T : unmanaged;
}

Buffer Cache

The BufferCache (in PhysicalMemory) tracks all buffer objects:
  • Handles overlapping buffer ranges
  • Manages CPU modification tracking
  • Performs buffer migrations when needed
  • Implements copy-on-write semantics
class BufferCache
{
    // Get or create buffer for GPU access
    public MultiRangeBuffer GetBuffer(MultiRange range, bool write)
    {
        // Check cache for existing buffer
        // Create new buffer if needed
        // Track modifications for synchronization
    }
}

Command Buffer Flow

Performance Optimizations

State Redundancy Check

Only propagate state changes to host API using shadow RAM comparison

Fast Path Uploads

Batch inline-to-memory and uniform buffer updates

Deferred Actions

Queue resource operations to run on the render thread

Sequence Numbers

Track resource modifications efficiently for cache coherency

References

Source Files

  • src/Ryujinx.Graphics.Gpu/GpuContext.cs
  • src/Ryujinx.Graphics.Gpu/GpuChannel.cs
  • src/Ryujinx.Graphics.Gpu/Engine/GPFifo/GPFifoProcessor.cs
  • src/Ryujinx.Graphics.Gpu/Engine/Threed/ThreedClass.cs

Build docs developers (and LLMs) love