batch_norm_op_kernel.cu
15.2 KB
-
Use local workspace for Context · fdf26ef2
Summary: This commit uses local(thread or stream) workspace for Context, which provides a more elegant way to dispatch kernels requiring scratch. Besides, TF32 math type is provided as a cuDNN option for Ampere device.
Ting PAN committed