CUDA, designed as an extension to C++, preserves its familiar abstractions. However, unlike CPU programming — where compilers and runtime systems abstract away most hardware concerns — writing CUDA code requires developers to manually map computations onto the GPU’s parallel execution and memory hierarchy, while respecting the fundamental constraints of the hardware. In this talk, I’ll discuss where this model breaks down and why an alternative low-level language is needed for productive, compositional GPU programming.
