Home 分享 记录 随想 Tags
v1cc0 with ♥️ © 2025

CUDA course

split my 12 hr CUDA course into sections. We'll cover: 1) the deep learning ecosystem 2) cuda setup/installation 3) gentle intro to gpus 4) writing your first kernels 5) kernel and system level profiling, atomics, and the cuda programming model 6) how and when to use cublas/cudnn 7) optimizing matrix multiplication 8) comparing cuda to triton 9) pytorch extensions 10) implementing a MLP trainer in pytorch, then numpy, then naive C, then naive CUDA (mnist dataset)

View on X →
1
Tags

v1cc0 with ♥️ © 2026