Fix the stream issue with NCCL2 on CUDA 9.2 and later
Summary: This commit enforces the stream synchronization before dispatching NCCL collectives. Otherwise, data corruption will happen due to the default value of ``NCCL_GROUP_CUDA_STREAM`` changed to 0 since CUDA 9.2, i.e., no explicit event waiting for unfinished kernels.
Showing
with
571 additions
and
591 deletions
docs/api/python/dali/ops/Brightness.rst
0 → 100644
docs/api/python/dali/ops/Contrast.rst
0 → 100644
dragon/utils/device/common_nccl.h
0 → 100644
-
Please register or sign in to post a comment