batchnorm.py
9.09 KB
-
Fix the stream issue with NCCL2 on CUDA 9.2 and later · 58708021
Summary: This commit enforces the stream synchronization before dispatching NCCL collectives. Otherwise, data corruption will happen due to the default value of ``NCCL_GROUP_CUDA_STREAM`` changed to 0 since CUDA 9.2, i.e., no explicit event waiting for unfinished kernels.
Ting PAN committed