image_ops.py
14.9 KB
-
Implement softmax kernels via warp reduce · 654febe3
Summary: This commit adds extra CUDA softmax kernels using warp reduce. Warp reduce leads to better performance when dimension <= 256, which is preferred for the recent vision transformers.
Ting PAN committed