Use FP32 accumulator for FP16 ReduceSum
Summary: This commit adds a fallback with FP32 accumulator for FP16 ReduceSum to avoid dropping too many small values. Besides, FP16 kernels for arch < 530 are almost available.
Showing
with
846 additions
and
782 deletions
dragon/utils/math/sort.h
deleted
100644 → 0
-
Please register or sign in to post a comment