Commits · 936c351bd954714a79241507664f20e4c91be4ba · SeetaResearch / Dragon

08 Jun, 2021 1 commit

Summary:
This commit allows transpose to compute in-place by leveraging buffer.
We also adds CRD mode for space-depth transpose (i.e., pixel shuffle).

committed Jun 09, 2021

936c351b

31 May, 2021 1 commit

Fix cuBLAS fp32 downcast issue on ampere devices · ac051717

Summary:
This commit removes the default cuBLAS tensor core math mode
when CUDA >= 11.0 on ampere devices to avoid the FP32 downcast math.

committed May 31, 2021

ac051717

13 May, 2021 1 commit
- Add Im2col operator · b7e2298f
```
Summary:
This commit adds im2col operator to unfold input to depth.
```
  Ting PAN committed May 13, 2021
  b7e2298f
07 May, 2021 1 commit
- Rename Triangular operator · 9f583556
```
Summary:
This commit renames triangular operator to trilu following the ONNX.
```
  Ting PAN committed May 07, 2021
  9f583556 Browse Files
01 May, 2021 1 commit

Remove HardSwish arguments · b7e959e9

Summary:
This commit fixes the alpha and beta to 1/6 and 0.5 for hardswish,
the same behavior as ONNX scheme.

committed May 01, 2021

b7e959e9

28 Apr, 2021 1 commit
- Add Reverse operator · 094c8c32
```
Summary:
This commit adds reverse or flip operator.
```
  Ting PAN committed Apr 28, 2021
  094c8c32 Browse Files
21 Apr, 2021 1 commit

Add GELU operator · bdf4e10f

Summary:
This commit adds GELU activation to compute output
via approximate or naive mode.

committed Apr 22, 2021

bdf4e10f

14 Apr, 2021 1 commit

Add scatter-gather elements operator · 43a82e77

Summary:
This commit adds scatter and gather operator
to remap elements along the given dimension of indices.

committed Apr 14, 2021

43a82e77

08 Apr, 2021 1 commit

Update with the new frontend API · f431756f

Summary:
The new frontend makes an union of two execution modes, while starts from
a single tensor class. Besides, it emits the operator execution through
a common path that works both for dragon and torch.

committed Apr 09, 2021

f431756f

04 Feb, 2021 1 commit

Reimplement the general matrix multiplication · 6bfe3e73

Summary:
This commit generalizes the fully-connected operation into GEMM,
and enhances the matmul operation via batched Dot, GEMV and GEMM.
New representations and attributes have been consistent with ONNX.

committed Feb 05, 2021

6bfe3e73

25 Jan, 2021 1 commit

Remove support for CUDNN v6 · 73ed1b96

Summary:
For the purpose of consistency on getting CUDNN convolution algorithms,
CUDNN v6 (mainly relied by CUDA 8.0) is now dropped.

committed Jan 26, 2021

73ed1b96 Browse Files

20 Jan, 2021 1 commit

Add sysconfig module · bbfecf22

Summary:
This commit adds the sysconfig module to get the build information.
Build information is helpful to select tests or report issues.

committed Jan 20, 2021

bbfecf22 Browse Files

16 Jan, 2021 1 commit

Refactor vision operators · 2c90589f

Summary:
This commit adds 1D and 3D support for vision operators
via a generalized ND implementation.

committed Jan 17, 2021

2c90589f

29 Dec, 2020 1 commit

Fix the extension compiling issues on win32 · 60e5d25a

Summary:
This commit fixes the two compiling issues on win32:
1. Remove the constexpr in generating protobuf headers.
2. Enforce the "/MT" flag to overwrite the "/MD" always.

committed Dec 29, 2020

60e5d25a

23 Dec, 2020 1 commit

Add tests of operator spec for AutoGraph · 1ad360e9

Summary:
This commit tests the correctness of shape inference and data type
blended by autograph module.

committed Dec 24, 2020

1ad360e9 Browse Files

15 Dec, 2020 1 commit

Fix a numerical issue by breaking the union of CUB storages · 1bd78a3c

Summary:
We found it unstable when defining CUB storages in a union.
More surveys should be taken to understand this issue.

committed Dec 16, 2020

1bd78a3c Browse Files

11 Dec, 2020 1 commit

Simplify the parsing of tensor arguments · ad83f4e4

Summary:
This commit moves the parser into ArgHelper which designed
to add the descriptors before.

committed Dec 12, 2020

ad83f4e4

10 Dec, 2020 1 commit

Simplify the feeding to repeated tensor arguments · 9fc5249b

Summary:
This commit feeds the repeated tensor arguments with
the entire array instead of the piecewise scalars.

committed Dec 11, 2020

9fc5249b Browse Files

09 Dec, 2020 1 commit

Refactor ONNX frontends and backends · b93bde0d

Summary:
This commit redesigns the ``vm.onnx`` by referring the official repository.
Frontends and backends are aligned with identical API for dragon, torch and tensorrt.

committed Dec 09, 2020

b93bde0d Browse Files

03 Dec, 2020 1 commit
- Fix the circular reference of GradientTape · e82d2ba4
```
Summary:
This commit removes the unused reference of GradientTape to
avoid the circular reference issue.
```
  Ting PAN committed Dec 03, 2020
  e82d2ba4 Browse Files
02 Dec, 2020 1 commit

Fix the bug of missing defs on blending assign operators · 9de0f1a3

Summary:

This commit attaches input and output together in assign operators,
which fixes the missing input defs due to identity from input to output.

committed Dec 02, 2020

9de0f1a3 Browse Files

29 Nov, 2020 1 commit
- Add FP16 support for DepthwiseConv2d && SyncBN Operator · 746f2cbb
```
Summary:
This commit adds pseudo FP16 kernels with FP32 conversions
for DepthwiseConv2d and SyncBN operator.
```
  Ting PAN committed Nov 30, 2020
  746f2cbb Browse Files
05 Nov, 2020 1 commit

Use FP32 accumulator for FP16 ReduceSum · d56e67d1

Summary:
This commit adds a fallback with FP32 accumulator
for FP16 ReduceSum to avoid dropping too many small values.
Besides, FP16 kernels for arch < 530 are almost available.

committed Nov 05, 2020

d56e67d1 Browse Files

24 Oct, 2020 1 commit

Add HardSigmoid && HardSwish && Swish Operator · 9ca4b60f

Summary:
This commit adds the hardsigmoid, hardswish and swish op with specialized kernel,
there are widely used in MobileNetV3 and EfficientNet.

committed Oct 25, 2020

9ca4b60f

20 Oct, 2020 1 commit

Optimize Concat && Split Operator · f76c693e

Summary:
This commit uses CopyMatrix to implement concat and split generically
instead of specialized kernels.

committed Oct 20, 2020

f76c693e Browse Files

14 Oct, 2020 1 commit

Fix bug of sharing the corrupted workspace data · 77dcd71d

Summary:
This commit uses unique tensors to provide workspace data
to avoid the corruption between operator and kernel.

committed Oct 15, 2020

77dcd71d Browse Files

13 Oct, 2020 1 commit

Add LinSpace Operator · e83c407a

Summary:
This commit adds the linspace op for dragon, torch and tensorflow.
And, a workaround for truncated int interval is made to range/linspace (up to 2**57).

committed Oct 13, 2020

e83c407a Browse Files

08 Oct, 2020 1 commit

Use block reduction for ArgMax and ArgMin Operator · 5cbbef4b

Summary:
This commit reimplements the cuda argmax/argmin via BlockReduce,
instead of the naive reduction in kernel loop.

committed Oct 08, 2020

5cbbef4b Browse Files

07 Oct, 2020 1 commit

Add Sort Operator · b4019faa

Summary:
This commit adds the sort op for dragon, torch and tensorflow.
Besides, cuda implementation of topk op is now available.

committed Oct 07, 2020

b4019faa Browse Files

27 Sep, 2020 1 commit

Use local workspace for Context · fdf26ef2

Summary:
This commit uses local(thread or stream) workspace for Context,
which provides a more elegant way to dispatch kernels requiring scratch.
Besides, TF32 math type is provided as a cuDNN option for Ampere device.

committed Sep 27, 2020

fdf26ef2 Browse Files

10 Sep, 2020 1 commit

Add Unique Operator · 1dd8aeef

Summary:
This commit adds the unique op for dragon, torch, tensorflow and onnx.
Besides, fixes the bug that gets the wrong workspace size in cached cudnn convolution.

committed Sep 11, 2020

1dd8aeef

05 Sep, 2020 1 commit

Use sequential sampling as the default shuffle policy · 80267d8f

Summary:
This commit reimplements the default shuffle policy of data reader with
sequential sampling (be consistent with DALI) instead of chunk permutation (MXNet solution).
Sequential sampling is tuned by argument ``initial_fill`` only, and works both for HDD and SSD.

committed Sep 05, 2020

80267d8f Browse Files

30 Aug, 2020 1 commit

Normalize the getter of operator argument · cca00c0d

Summary:
This commit renames the operator argument getter to ``GetArgument``
whatever an argument is single or repeated.

committed Aug 30, 2020

cca00c0d

23 Aug, 2020 1 commit

Fix the stream issue with NCCL2 on CUDA 9.2 and later · 58708021

Summary:
This commit enforces the stream synchronization before dispatching NCCL collectives.
Otherwise, data corruption will happen due to the default value of ``NCCL_GROUP_CUDA_STREAM``
changed to 0 since CUDA 9.2, i.e., no explicit event waiting for unfinished kernels.

committed Aug 24, 2020

58708021 Browse Files

12 Aug, 2020 1 commit

Add support to query the size for the allocated memory · 58c5371e

Summary:
This commit adds the ``memory_allocated`` API for ``dragon.Workspace``
to query the size of allocated memory (and optionally on a specified device).

committed Aug 13, 2020

58c5371e Browse Files

07 Aug, 2020 1 commit

Fix the skipped algorithm finding in cached CUDNN convolution · a7a7e4fc

Summary:
This commit enforces the algorithm finding even if the backward of filter or data
will not be executed. Otherwise, the empty algorithm will be encountered between
two cached operation with the same arguments and input shape.

committed Aug 08, 2020

a7a7e4fc Browse Files

05 Aug, 2020 1 commit

Remove the deprecated DALI API · 218796ed

Summary:
This commit removes the deprecated API for DALI 0.24.
Besides, variable length keyword arguments are added for forward compatibility.

committed Aug 06, 2020

218796ed

03 Aug, 2020 1 commit

Add support for building Ampere GPU, CUDA11 and CUDNN8 · c40eaf7b

Summary:
This commit fixes the issue on building with CUDA11 and CUDNN8.
Besides, C++14 is enabled by default instead of C++11 to support CUB 1.9+,
and for this reason, the compiler is required to be gcc5/clang6/msvc141 or higher.

committed Aug 03, 2020

c40eaf7b

30 Jul, 2020 1 commit

Init sphinx documentation for C++ API · d8f612c8

Summary:
This commit uses sphinx to generate C++ API documentation
whose style and theme are consistent with the Python API.

committed Jul 31, 2020

d8f612c8

25 Jul, 2020 1 commit
- Fix the device inference in eager execution · 8dbb73a7
```
Summary:
This commit computes the correct device for an existing output tensor.
```
  Ting PAN committed Jul 26, 2020
  8dbb73a7