CUDA course split my 12 hr CUDA course into sections. We'll cover: 1) the deep learning ecosystem 2) cuda setup/installation 3) gentle intro to gpus 4) writing your first kernels 5) kernel and system level profiling, atomics, and the cuda programming model 6) how and when to use cublas/cudnn 7) optimizing matrix multiplication 8) comparing cuda to triton 9) pytorch extensions 10) implementing a MLP trainer in pytorch, then numpy, then naive C, then naive CUDA (mnist dataset) View on X →