1. 16 Jan, 2021 1 commit
  2. 29 Dec, 2020 1 commit
  3. 23 Dec, 2020 1 commit
  4. 15 Dec, 2020 1 commit
  5. 11 Dec, 2020 1 commit
  6. 10 Dec, 2020 1 commit
  7. 09 Dec, 2020 1 commit
    • Refactor ONNX frontends and backends · b93bde0d
      Summary:
      This commit redesigns the ``vm.onnx`` by referring the official repository.
      Frontends and backends are aligned with identical API for dragon, torch and tensorrt.
      Ting PAN committed
  8. 03 Dec, 2020 1 commit
  9. 02 Dec, 2020 1 commit
  10. 29 Nov, 2020 1 commit
  11. 05 Nov, 2020 1 commit
    • Use FP32 accumulator for FP16 ReduceSum · d56e67d1
      Summary:
      This commit adds a fallback with FP32 accumulator
      for FP16 ReduceSum to avoid dropping too many small values.
      Besides, FP16 kernels for arch < 530 are almost available.
      Ting PAN committed
  12. 24 Oct, 2020 1 commit
  13. 20 Oct, 2020 1 commit
  14. 14 Oct, 2020 1 commit
  15. 13 Oct, 2020 1 commit
    • Add LinSpace Operator · e83c407a
      Summary:
      This commit adds the linspace op for dragon, torch and tensorflow.
      And, a workaround for truncated int interval is made to range/linspace (up to 2**57).
      Ting PAN committed
  16. 08 Oct, 2020 1 commit
  17. 07 Oct, 2020 1 commit
    • Add Sort Operator · b4019faa
      Summary:
      This commit adds the sort op for dragon, torch and tensorflow.
      Besides, cuda implementation of topk op is now available.
      Ting PAN committed
  18. 27 Sep, 2020 1 commit
    • Use local workspace for Context · fdf26ef2
      Summary:
      This commit uses local(thread or stream) workspace for Context,
      which provides a more elegant way to dispatch kernels requiring scratch.
      Besides, TF32 math type is provided as a cuDNN option for Ampere device.
      Ting PAN committed
  19. 10 Sep, 2020 1 commit
    • Add Unique Operator · 1dd8aeef
      Summary:
      This commit adds the unique op for dragon, torch, tensorflow and onnx.
      Besides, fixes the bug that gets the wrong workspace size in cached cudnn convolution.
      Ting PAN committed
  20. 05 Sep, 2020 1 commit
    • Use sequential sampling as the default shuffle policy · 80267d8f
      Summary:
      This commit reimplements the default shuffle policy of data reader with
      sequential sampling (be consistent with DALI) instead of chunk permutation (MXNet solution).
      Sequential sampling is tuned by argument ``initial_fill`` only, and works both for HDD and SSD.
      Ting PAN committed
  21. 30 Aug, 2020 1 commit
  22. 23 Aug, 2020 1 commit
    • Fix the stream issue with NCCL2 on CUDA 9.2 and later · 58708021
      Summary:
      This commit enforces the stream synchronization before dispatching NCCL collectives.
      Otherwise, data corruption will happen due to the default value of ``NCCL_GROUP_CUDA_STREAM``
      changed to 0 since CUDA 9.2, i.e., no explicit event waiting for unfinished kernels.
      Ting PAN committed
  23. 12 Aug, 2020 1 commit
  24. 07 Aug, 2020 1 commit
  25. 05 Aug, 2020 1 commit
    • Remove the deprecated DALI API · 218796ed
      Summary:
      This commit removes the deprecated API for DALI 0.24.
      Besides, variable length keyword arguments are added for forward compatibility.
      Ting PAN committed
  26. 03 Aug, 2020 1 commit
  27. 30 Jul, 2020 1 commit
  28. 25 Jul, 2020 1 commit
  29. 24 Jul, 2020 1 commit
  30. 16 Jul, 2020 1 commit
  31. 15 Jul, 2020 1 commit
  32. 14 Jul, 2020 1 commit
  33. 13 Jul, 2020 2 commits
  34. 11 Jul, 2020 1 commit
  35. 06 Jul, 2020 1 commit
    • Add native ops test · adb6fa64
      Summary:
      This commit tests the executing of native ops and verifies the results.
      Several bugs are found and fixed according to these tests.
      Ting PAN committed
  36. 22 Jun, 2020 2 commits
  37. 17 Jun, 2020 1 commit
  38. 31 May, 2019 1 commit