Overview
The scheduler is responsible for converting the high-level UOp graph into a linear sequence of executable kernels. It performs critical optimizations including kernel fusion, memory planning, and dependency tracking.The scheduler is implemented in
tinygrad/engine/schedule.py.Scheduler Responsibilities
The scheduler performs several key tasks:Graph Partitioning
Breaking large computation graphs into executable kernels
Kernel Fusion
Combining operations into efficient fused kernels
Memory Planning
Optimizing buffer allocation and reuse
Dependency Tracking
Ensuring correct execution order
Scheduling Pipeline
ExecItem
The scheduler outputs a list ofExecItem objects, where each represents one kernel to execute:
ExecItem Creation
From the schedule,ExecItem is created for each kernel:
Kernel Fusion
Kernel fusion is one of tinygrad’s most powerful optimizations, combining multiple operations into single kernels.
Fusion Benefits
- Reduced memory traffic - Intermediate results stay in registers
- Fewer kernel launches - Lower overhead
- Better cache utilization - Improved memory locality
- More optimization opportunities - Larger scope for compiler
Fusion Example
Consider this sequence of operations:Fusion Constraints
Not all operations can be fused:- Reduction boundaries - Reductions often require separate kernels
- Memory dependencies - Can’t fuse operations that need intermediate materialization
- Device limitations - Kernels must fit in device resources
- Multi-device ops - Cross-device operations need separate kernels
Dependency Tracking
The scheduler builds a dependency graph to ensure correct execution order.Dependency Types
Must read after previous write completes
Must write after previous read completes
Must write in correct order
Dependency Graph Construction
The scheduler:- Identifies all kernel operations (
CALL,ENDUOps) - Extracts buffer dependencies from each kernel
- Builds producer-consumer edges
- Computes in-degree for topological sort
Topological Sort
The scheduler performs a topological sort to linearize the schedule:Memory Planning
After scheduling, memory planning optimizes buffer allocation:Memory Planning Goals
- Buffer reuse - Reuse allocations when buffers are no longer needed
- Memory minimization - Reduce peak memory usage
- Allocation efficiency - Batch allocations when possible
Memory Planning Algorithm
The planner:- Tracks buffer lifetimes through the schedule
- Identifies opportunities for reuse
- Assigns physical allocations to logical buffers
- Inserts allocation/deallocation operations
Multi-Device Scheduling
For multi-GPU operations, the scheduler handles device coordination:Multi-Device UOps
MSELECT- Select buffer based on device IDMSTACK- Stack of multi-device buffers
Device Synchronization
The scheduler inserts synchronization when needed:- All-reduce - Collective operations across devices
- Device barriers - Ensure operation completion
- Data transfers - Move data between devices
Schedule Visualization
Visualize the schedule:- Kernel operations
- Buffer dependencies
- Execution order
- Fusion decisions
Rangeify
The rangeify pass intinygrad/schedule/rangeify.py handles:
- Loop construction - Creating iteration ranges
- Index calculation - Computing buffer indices
- WAR dependency insertion - Adding write-after-read dependencies
Debugging the Scheduler
View Schedule
Trace Fusion Decisions
Verify Memory Planning
Check Dependencies
Use process replay to verify correctness:Optimization Tips
Schedule Caching
tinygrad caches schedules to avoid recomputation:Advanced Topics
Custom Scheduling
For advanced use cases, you can customize scheduling behavior with environment variables.JIT Integration
The scheduler integrates withTinyJit to capture and replay schedules: