Commits · master · SeetaResearch / Dragon

27 Mar, 2022 1 commit

Refactor the python distribution script · a79a3bba

Summary:
This commit correctly uses the distribution commands to collect the
python package and the compiled dynamic libraries.

committed Mar 27, 2022

a79a3bba

31 Dec, 2021 1 commit

Optimize training update operators · 494774d3

Summary:
This commit fuses the weight decay and mixed precision conversion
into update kernels to get lower training latency.

committed Dec 31, 2021

494774d3

20 Dec, 2021 1 commit
- Merge internal commits · fb47d86f
  Ting PAN committed Dec 20, 2021
  
  fb47d86f
29 Jun, 2021 1 commit

Enhance tensor memory mapping with an optional offset · 3dfb6ea5

Summary:
This commit adds a feature to map memory and load data with offset.
Tensor takes mapped memory is readonly and will drop it on next mutation.

committed Jun 29, 2021

3dfb6ea5 Browse Files

25 Jun, 2021 1 commit

Implement softmax kernels via warp reduce · 654febe3

Summary:
This commit adds extra CUDA softmax kernels using warp reduce.
Warp reduce leads to better performance when dimension <= 256,
which is preferred for the recent vision transformers.

committed Jun 26, 2021

654febe3 Browse Files

22 Jun, 2021 1 commit

Move transpose kernels into math namespace · 3a97dc22

Summary:
This commit uses "math::Transpose" instead of "kernels::Transpose"
for more possible optimized routines in the future.

committed Jun 22, 2021

3a97dc22

19 Jun, 2021 1 commit

Instantiate dispatch template by value for crucial CUDA kernels · 46feba80

Summary:
This commit instantiates CUDA kernels by using constant dimensions
to enable the optimization during compiler-time.

committed Jun 19, 2021

46feba80

08 Jun, 2021 1 commit

Enhance transpose operators · 936c351b

Summary:
This commit allows transpose to compute in-place by leveraging buffer.
We also adds CRD mode for space-depth transpose (i.e., pixel shuffle).

committed Jun 09, 2021

936c351b

31 May, 2021 1 commit

Fix cuBLAS fp32 downcast issue on ampere devices · ac051717

Summary:
This commit removes the default cuBLAS tensor core math mode
when CUDA >= 11.0 on ampere devices to avoid the FP32 downcast math.

committed May 31, 2021

ac051717

13 May, 2021 1 commit
- Add Im2col operator · b7e2298f
```
Summary:
This commit adds im2col operator to unfold input to depth.
```
  Ting PAN committed May 13, 2021
  b7e2298f
07 May, 2021 1 commit
- Rename Triangular operator · 9f583556
```
Summary:
This commit renames triangular operator to trilu following the ONNX.
```
  Ting PAN committed May 07, 2021
  9f583556 Browse Files
01 May, 2021 1 commit

Remove HardSwish arguments · b7e959e9

Summary:
This commit fixes the alpha and beta to 1/6 and 0.5 for hardswish,
the same behavior as ONNX scheme.

committed May 01, 2021

b7e959e9

28 Apr, 2021 1 commit
- Add Reverse operator · 094c8c32
```
Summary:
This commit adds reverse or flip operator.
```
  Ting PAN committed Apr 28, 2021
  094c8c32 Browse Files
21 Apr, 2021 1 commit

Add GELU operator · bdf4e10f

Summary:
This commit adds GELU activation to compute output
via approximate or naive mode.

committed Apr 22, 2021

bdf4e10f

14 Apr, 2021 1 commit

Add scatter-gather elements operator · 43a82e77

Summary:
This commit adds scatter and gather operator
to remap elements along the given dimension of indices.

committed Apr 14, 2021

43a82e77

08 Apr, 2021 1 commit

Update with the new frontend API · f431756f

Summary:
The new frontend makes an union of two execution modes, while starts from
a single tensor class. Besides, it emits the operator execution through
a common path that works both for dragon and torch.

committed Apr 09, 2021

f431756f

04 Feb, 2021 1 commit

Reimplement the general matrix multiplication · 6bfe3e73

Summary:
This commit generalizes the fully-connected operation into GEMM,
and enhances the matmul operation via batched Dot, GEMV and GEMM.
New representations and attributes have been consistent with ONNX.

committed Feb 05, 2021

6bfe3e73

25 Jan, 2021 1 commit

Remove support for CUDNN v6 · 73ed1b96

Summary:
For the purpose of consistency on getting CUDNN convolution algorithms,
CUDNN v6 (mainly relied by CUDA 8.0) is now dropped.

committed Jan 26, 2021

73ed1b96 Browse Files

20 Jan, 2021 1 commit

Add sysconfig module · bbfecf22

Summary:
This commit adds the sysconfig module to get the build information.
Build information is helpful to select tests or report issues.

committed Jan 20, 2021

bbfecf22 Browse Files

16 Jan, 2021 1 commit

Refactor vision operators · 2c90589f

Summary:
This commit adds 1D and 3D support for vision operators
via a generalized ND implementation.

committed Jan 17, 2021

2c90589f

29 Dec, 2020 1 commit

Fix the extension compiling issues on win32 · 60e5d25a

Summary:
This commit fixes the two compiling issues on win32:
1. Remove the constexpr in generating protobuf headers.
2. Enforce the "/MT" flag to overwrite the "/MD" always.

committed Dec 29, 2020

60e5d25a

23 Dec, 2020 1 commit

Add tests of operator spec for AutoGraph · 1ad360e9

Summary:
This commit tests the correctness of shape inference and data type
blended by autograph module.

committed Dec 24, 2020

1ad360e9 Browse Files

15 Dec, 2020 1 commit

Fix a numerical issue by breaking the union of CUB storages · 1bd78a3c

Summary:
We found it unstable when defining CUB storages in a union.
More surveys should be taken to understand this issue.

committed Dec 16, 2020

1bd78a3c Browse Files

11 Dec, 2020 1 commit

Simplify the parsing of tensor arguments · ad83f4e4

Summary:
This commit moves the parser into ArgHelper which designed
to add the descriptors before.

committed Dec 12, 2020

ad83f4e4

10 Dec, 2020 1 commit

Simplify the feeding to repeated tensor arguments · 9fc5249b

Summary:
This commit feeds the repeated tensor arguments with
the entire array instead of the piecewise scalars.

committed Dec 11, 2020

9fc5249b Browse Files

09 Dec, 2020 1 commit

Refactor ONNX frontends and backends · b93bde0d

Summary:
This commit redesigns the ``vm.onnx`` by referring the official repository.
Frontends and backends are aligned with identical API for dragon, torch and tensorrt.

committed Dec 09, 2020

b93bde0d Browse Files

03 Dec, 2020 1 commit
- Fix the circular reference of GradientTape · e82d2ba4
```
Summary:
This commit removes the unused reference of GradientTape to
avoid the circular reference issue.
```
  Ting PAN committed Dec 03, 2020
  e82d2ba4 Browse Files
02 Dec, 2020 1 commit

Fix the bug of missing defs on blending assign operators · 9de0f1a3

Summary:

This commit attaches input and output together in assign operators,
which fixes the missing input defs due to identity from input to output.

committed Dec 02, 2020

9de0f1a3 Browse Files

29 Nov, 2020 1 commit
- Add FP16 support for DepthwiseConv2d && SyncBN Operator · 746f2cbb
```
Summary:
This commit adds pseudo FP16 kernels with FP32 conversions
for DepthwiseConv2d and SyncBN operator.
```
  Ting PAN committed Nov 30, 2020
  746f2cbb Browse Files
05 Nov, 2020 1 commit

Use FP32 accumulator for FP16 ReduceSum · d56e67d1

Summary:
This commit adds a fallback with FP32 accumulator
for FP16 ReduceSum to avoid dropping too many small values.
Besides, FP16 kernels for arch < 530 are almost available.

committed Nov 05, 2020

d56e67d1 Browse Files

24 Oct, 2020 1 commit

Add HardSigmoid && HardSwish && Swish Operator · 9ca4b60f

Summary:
This commit adds the hardsigmoid, hardswish and swish op with specialized kernel,
there are widely used in MobileNetV3 and EfficientNet.

committed Oct 25, 2020

9ca4b60f Browse Files

20 Oct, 2020 1 commit

Optimize Concat && Split Operator · f76c693e

Summary:
This commit uses CopyMatrix to implement concat and split generically
instead of specialized kernels.

committed Oct 20, 2020

f76c693e

14 Oct, 2020 1 commit

Fix bug of sharing the corrupted workspace data · 77dcd71d

Summary:
This commit uses unique tensors to provide workspace data
to avoid the corruption between operator and kernel.

committed Oct 15, 2020

77dcd71d Browse Files

13 Oct, 2020 1 commit

Add LinSpace Operator · e83c407a

Summary:
This commit adds the linspace op for dragon, torch and tensorflow.
And, a workaround for truncated int interval is made to range/linspace (up to 2**57).

committed Oct 13, 2020

e83c407a

08 Oct, 2020 1 commit

Use block reduction for ArgMax and ArgMin Operator · 5cbbef4b

Summary:
This commit reimplements the cuda argmax/argmin via BlockReduce,
instead of the naive reduction in kernel loop.

committed Oct 08, 2020

5cbbef4b Browse Files

07 Oct, 2020 1 commit

Add Sort Operator · b4019faa

Summary:
This commit adds the sort op for dragon, torch and tensorflow.
Besides, cuda implementation of topk op is now available.

committed Oct 07, 2020

b4019faa Browse Files

27 Sep, 2020 1 commit

Use local workspace for Context · fdf26ef2

Summary:
This commit uses local(thread or stream) workspace for Context,
which provides a more elegant way to dispatch kernels requiring scratch.
Besides, TF32 math type is provided as a cuDNN option for Ampere device.

committed Sep 27, 2020

fdf26ef2 Browse Files

10 Sep, 2020 1 commit

Add Unique Operator · 1dd8aeef

Summary:
This commit adds the unique op for dragon, torch, tensorflow and onnx.
Besides, fixes the bug that gets the wrong workspace size in cached cudnn convolution.

committed Sep 11, 2020

1dd8aeef

05 Sep, 2020 1 commit

Use sequential sampling as the default shuffle policy · 80267d8f

Summary:
This commit reimplements the default shuffle policy of data reader with
sequential sampling (be consistent with DALI) instead of chunk permutation (MXNet solution).
Sequential sampling is tuned by argument ``initial_fill`` only, and works both for HDD and SSD.

committed Sep 05, 2020

80267d8f Browse Files

30 Aug, 2020 1 commit

Normalize the getter of operator argument · cca00c0d

Summary:
This commit renames the operator argument getter to ``GetArgument``
whatever an argument is single or repeated.

committed Aug 30, 2020

cca00c0d