a and b live in unified memory.
Operations on Any Device
In MLX, rather than moving arrays to devices, you specify the device when you run the operation. Any device can perform any operation ona and b without needing to move them from one memory location to another. For example:
See the Streams documentation for more information on the semantics of streams in MLX.
Automatic Dependency Management
In the aboveadd example, there are no dependencies between operations, so there is no possibility for race conditions. If there are dependencies, the MLX scheduler will automatically manage them. For example:
add runs on the GPU but it depends on the output of the first add which is running on the CPU. MLX will automatically insert a dependency between the two streams so that the second add only starts executing after the first is complete and c is available.
A Simple Example
Here is a more interesting (albeit slightly contrived example) of how unified memory can be helpful. Suppose we have the following computation:matmul operation is a good fit for the GPU since it’s more compute dense. The second sequence of operations are a better fit for the CPU, since they are very small and would probably be overhead bound on the GPU.
By leveraging unified memory, MLX allows you to optimize performance by running different parts of your computation on the most appropriate device, all without explicit memory transfers.