Implement softmax kernels via warp reduce
Summary: This commit adds extra CUDA softmax kernels using warp reduce. Warp reduce leads to better performance when dimension <= 256, which is preferred for the recent vision transformers.
Showing
with
535 additions
and
231 deletions
dali/core/ops/math_ops.py
0 → 100644
docs/api/python/dali/ops/Normalize.rst
0 → 100644
-
Please register or sign in to post a comment