test_nn.py
27.9 KB
-
Use FP32 accumulator for FP16 ReduceSum · d56e67d1
Summary: This commit adds a fallback with FP32 accumulator for FP16 ReduceSum to avoid dropping too many small values. Besides, FP16 kernels for arch < 530 are almost available.
Ting PAN committed