functional.py
45.6 KB
-
Use FP32 accumulator for FP16 ReduceSum · d56e67d1
Summary: This commit adds a fallback with FP32 accumulator for FP16 ReduceSum to avoid dropping too many small values. Besides, FP16 kernels for arch < 530 are almost available.
Ting PAN committed