1. 02 Dec, 2020 1 commit
  2. 29 Nov, 2020 1 commit
  3. 05 Nov, 2020 1 commit
  4. 24 Oct, 2020 1 commit
  5. 20 Oct, 2020 1 commit
  6. 14 Oct, 2020 1 commit
  7. 13 Oct, 2020 1 commit
    • Add LinSpace Operator · e83c407a
      Summary:
      This commit adds the linspace op for dragon, torch and tensorflow.
      And, a workaround for truncated int interval is made to range/linspace (up to 2**57).
      Ting PAN committed
  8. 08 Oct, 2020 1 commit
  9. 07 Oct, 2020 1 commit
  10. 27 Sep, 2020 1 commit
    • Use local workspace for Context · fdf26ef2
      Summary:
      This commit uses local(thread or stream) workspace for Context,
      which provides a more elegant way to dispatch kernels requiring scratch.
      Besides, TF32 math type is provided as a cuDNN option for Ampere device.
      Ting PAN committed
  11. 10 Sep, 2020 1 commit
    • Add Unique Operator · 1dd8aeef
      Summary:
      This commit adds the unique op for dragon, torch, tensorflow and onnx.
      Besides, fixes the bug that gets the wrong workspace size in cached cudnn convolution.
      Ting PAN committed
  12. 05 Sep, 2020 1 commit
  13. 30 Aug, 2020 1 commit
  14. 23 Aug, 2020 1 commit
    • Fix the stream issue with NCCL2 on CUDA 9.2 and later · 58708021
      Summary:
      This commit enforces the stream synchronization before dispatching NCCL collectives.
      Otherwise, data corruption will happen due to the default value of ``NCCL_GROUP_CUDA_STREAM``
      changed to 0 since CUDA 9.2, i.e., no explicit event waiting for unfinished kernels.
      Ting PAN committed
  15. 12 Aug, 2020 1 commit
  16. 07 Aug, 2020 1 commit
  17. 05 Aug, 2020 1 commit
  18. 03 Aug, 2020 1 commit
  19. 30 Jul, 2020 1 commit
  20. 25 Jul, 2020 1 commit
  21. 24 Jul, 2020 1 commit
  22. 16 Jul, 2020 1 commit
  23. 15 Jul, 2020 1 commit
  24. 14 Jul, 2020 1 commit
  25. 13 Jul, 2020 2 commits
  26. 11 Jul, 2020 1 commit
  27. 06 Jul, 2020 1 commit
  28. 22 Jun, 2020 2 commits
    • Simplify the operation executor · df172cc8
      Summary:
      This commit removes the redundant workspace reference
      when executing a tensor operation.
      Ting PAN committed
    • Fix the bug on dropout cuda kernel · b37d4e5e
      Summary:
      We forgot to handle the inplace case that generated
      the random elements on the output(i.e. the input).
      
      Besides, this commit also fixes the omitted `RunOnDevice` for cudnn activations,
      which will rightly dispatches the implementation.
      Ting PAN committed
  29. 17 Jun, 2020 1 commit
  30. 31 May, 2019 1 commit
  31. 26 May, 2019 1 commit
  32. 21 May, 2019 1 commit
  33. 15 May, 2019 1 commit
  34. 14 May, 2019 2 commits
  35. 16 Apr, 2019 1 commit
  36. 11 Apr, 2019 1 commit
  37. 04 Apr, 2019 1 commit