Commits · 1bd78a3cef49640103b4a7becccbffbc2753c170 · SeetaResearch / Dragon

15 Dec, 2020 1 commit

Fix a numerical issue by breaking the union of CUB storages · 1bd78a3c

Summary:
We found it unstable when defining CUB storages in a union.
More surveys should be taken to understand this issue.

committed Dec 16, 2020

1bd78a3c Browse File

11 Dec, 2020 1 commit
- Simplify the parsing of tensor arguments · ad83f4e4
```
Summary:
This commit moves the parser into ArgHelper which designed
to add the descriptors before.
```
  Ting PAN committed Dec 12, 2020
  ad83f4e4 Browse Files
10 Dec, 2020 1 commit

Simplify the feeding to repeated tensor arguments · 9fc5249b

Summary:
This commit feeds the repeated tensor arguments with
the entire array instead of the piecewise scalars.

committed Dec 11, 2020

9fc5249b Browse File

09 Dec, 2020 1 commit

Refactor ONNX frontends and backends · b93bde0d

Summary:
This commit redesigns the ``vm.onnx`` by referring the official repository.
Frontends and backends are aligned with identical API for dragon, torch and tensorrt.

committed Dec 09, 2020

b93bde0d

03 Dec, 2020 1 commit

Fix the circular reference of GradientTape · e82d2ba4

Summary:
This commit removes the unused reference of GradientTape to
avoid the circular reference issue.

committed Dec 03, 2020

e82d2ba4 Browse File

02 Dec, 2020 1 commit

Fix the bug of missing defs on blending assign operators · 9de0f1a3

Summary:

This commit attaches input and output together in assign operators,
which fixes the missing input defs due to identity from input to output.

committed Dec 02, 2020

9de0f1a3

29 Nov, 2020 1 commit

Add FP16 support for DepthwiseConv2d && SyncBN Operator · 746f2cbb

Summary:
This commit adds pseudo FP16 kernels with FP32 conversions
for DepthwiseConv2d and SyncBN operator.

committed Nov 30, 2020

746f2cbb

05 Nov, 2020 1 commit

Use FP32 accumulator for FP16 ReduceSum · d56e67d1

Summary:
This commit adds a fallback with FP32 accumulator
for FP16 ReduceSum to avoid dropping too many small values.
Besides, FP16 kernels for arch < 530 are almost available.

committed Nov 05, 2020

d56e67d1 Browse Files

24 Oct, 2020 1 commit

Add HardSigmoid && HardSwish && Swish Operator · 9ca4b60f

Summary:
This commit adds the hardsigmoid, hardswish and swish op with specialized kernel,
there are widely used in MobileNetV3 and EfficientNet.

committed Oct 25, 2020

9ca4b60f Browse File

20 Oct, 2020 1 commit

Optimize Concat && Split Operator · f76c693e

Summary:
This commit uses CopyMatrix to implement concat and split generically
instead of specialized kernels.

committed Oct 20, 2020

f76c693e

14 Oct, 2020 1 commit

Fix bug of sharing the corrupted workspace data · 77dcd71d

Summary:
This commit uses unique tensors to provide workspace data
to avoid the corruption between operator and kernel.

committed Oct 15, 2020

77dcd71d

13 Oct, 2020 1 commit

Add LinSpace Operator · e83c407a

Summary:
This commit adds the linspace op for dragon, torch and tensorflow.
And, a workaround for truncated int interval is made to range/linspace (up to 2**57).

committed Oct 13, 2020

e83c407a

08 Oct, 2020 1 commit

Use block reduction for ArgMax and ArgMin Operator · 5cbbef4b

Summary:
This commit reimplements the cuda argmax/argmin via BlockReduce,
instead of the naive reduction in kernel loop.

committed Oct 08, 2020

5cbbef4b Browse Files

07 Oct, 2020 1 commit

Add Sort Operator · b4019faa

Summary:
This commit adds the sort op for dragon, torch and tensorflow.
Besides, cuda implementation of topk op is now available.

committed Oct 07, 2020

b4019faa

27 Sep, 2020 1 commit

Use local workspace for Context · fdf26ef2

Summary:
This commit uses local(thread or stream) workspace for Context,
which provides a more elegant way to dispatch kernels requiring scratch.
Besides, TF32 math type is provided as a cuDNN option for Ampere device.

committed Sep 27, 2020

fdf26ef2

10 Sep, 2020 1 commit

Add Unique Operator · 1dd8aeef

Summary:
This commit adds the unique op for dragon, torch, tensorflow and onnx.
Besides, fixes the bug that gets the wrong workspace size in cached cudnn convolution.

committed Sep 11, 2020

1dd8aeef

05 Sep, 2020 1 commit

Use sequential sampling as the default shuffle policy · 80267d8f

Summary:
This commit reimplements the default shuffle policy of data reader with
sequential sampling (be consistent with DALI) instead of chunk permutation (MXNet solution).
Sequential sampling is tuned by argument ``initial_fill`` only, and works both for HDD and SSD.

committed Sep 05, 2020

80267d8f

30 Aug, 2020 1 commit

Normalize the getter of operator argument · cca00c0d

Summary:
This commit renames the operator argument getter to ``GetArgument``
whatever an argument is single or repeated.

committed Aug 30, 2020

cca00c0d

23 Aug, 2020 1 commit

Fix the stream issue with NCCL2 on CUDA 9.2 and later · 58708021

Summary:
This commit enforces the stream synchronization before dispatching NCCL collectives.
Otherwise, data corruption will happen due to the default value of ``NCCL_GROUP_CUDA_STREAM``
changed to 0 since CUDA 9.2, i.e., no explicit event waiting for unfinished kernels.

committed Aug 24, 2020

58708021 Browse File

12 Aug, 2020 1 commit

Add support to query the size for the allocated memory · 58c5371e

Summary:
This commit adds the ``memory_allocated`` API for ``dragon.Workspace``
to query the size of allocated memory (and optionally on a specified device).

committed Aug 13, 2020

58c5371e

07 Aug, 2020 1 commit

Fix the skipped algorithm finding in cached CUDNN convolution · a7a7e4fc

Summary:
This commit enforces the algorithm finding even if the backward of filter or data
will not be executed. Otherwise, the empty algorithm will be encountered between
two cached operation with the same arguments and input shape.

committed Aug 08, 2020

a7a7e4fc

05 Aug, 2020 1 commit

Remove the deprecated DALI API · 218796ed

Summary:
This commit removes the deprecated API for DALI 0.24.
Besides, variable length keyword arguments are added for forward compatibility.

committed Aug 06, 2020

218796ed

03 Aug, 2020 1 commit

Add support for building Ampere GPU, CUDA11 and CUDNN8 · c40eaf7b

Summary:
This commit fixes the issue on building with CUDA11 and CUDNN8.
Besides, C++14 is enabled by default instead of C++11 to support CUB 1.9+,
and for this reason, the compiler is required to be gcc5/clang6/msvc141 or higher.

committed Aug 03, 2020

c40eaf7b Browse File

30 Jul, 2020 1 commit

Init sphinx documentation for C++ API · d8f612c8

Summary:
This commit uses sphinx to generate C++ API documentation
whose style and theme are consistent with the Python API.

committed Jul 31, 2020

d8f612c8

25 Jul, 2020 1 commit
- Fix the device inference in eager execution · 8dbb73a7
```
Summary:
This commit computes the correct device for an existing output tensor.
```
  Ting PAN committed Jul 26, 2020
  8dbb73a7
24 Jul, 2020 1 commit

Normalize the API view of vm packages · c0c43218

Summary:
This commit enforces all the vm packages to take api/core structure
to adapt to the more complex future developments.

committed Jul 24, 2020

c0c43218

16 Jul, 2020 1 commit
- Fix the missing extra arguments in compiled function · ae4d6834
```
Summary:
This commit correctly passes the extra arguments
when executing a compiled graph function.
```
  Ting PAN committed Jul 17, 2020
  ae4d6834
15 Jul, 2020 1 commit

Make the consistency for shape and data type · aa2ec8c3

Summary:
This commit enforces the shape and data type inherited from the same metaclass,
which ensures the consistency between different styles.

committed Jul 16, 2020

aa2ec8c3

14 Jul, 2020 1 commit
- Normalize the math notations in docstring · 0ab14f30
```
Summary:
This commit normalizes the inconsistent math notations in docstring.
```
  Ting PAN committed Jul 15, 2020
  0ab14f30 Browse Directory
13 Jul, 2020 2 commits
- Align the tensor abstraction · 2598f4dc
```
Summary:
This commit aligns the properties and methods of tensor class.
```
  Ting PAN committed Jul 14, 2020
  2598f4dc Browse Directory
- Add the explicit clear method for workspace · 413dbad0
```
Summary:
This commit adds the clear method to free resources manually
if the workspace instance is referenced circularly.
```
  Ting PAN committed Jul 13, 2020
  413dbad0 Browse File
11 Jul, 2020 1 commit

Remove the duplicate workspace singletons · 02ad90d5

Summary:
This commit moves the workspace api into the current workspace instance.
For this reason, the namespace ``dragon.workspace`` is removed for simplicity.

committed Jul 12, 2020

02ad90d5 Browse File

06 Jul, 2020 1 commit

Add native ops test · adb6fa64

Summary:
This commit tests the executing of native ops and verifies the results.
Several bugs are found and fixed according to these tests.

committed Jul 07, 2020

adb6fa64

22 Jun, 2020 2 commits

Simplify the operation executor · df172cc8

Summary:
This commit removes the redundant workspace reference
when executing a tensor operation.

committed Jun 22, 2020

df172cc8 Browse Files

Fix the bug on dropout cuda kernel · b37d4e5e

Summary:
We forgot to handle the inplace case that generated
the random elements on the output(i.e. the input).

Besides, this commit also fixes the omitted `RunOnDevice` for cudnn activations,
which will rightly dispatches the implementation.

committed Jun 22, 2020

b37d4e5e

17 Jun, 2020 1 commit
- Merge internal commits · c1b8f912
  Ting PAN committed Jun 18, 2020
  
  c1b8f912
31 May, 2019 1 commit
- Add ChannelShuffleOp · ae11a987
  Ting PAN committed May 31, 2019
  
  ae11a987
26 May, 2019 1 commit
- Fix the missing alias in SimulateGC · 6b82cb26
  Ting PAN committed May 26, 2019
  
  6b82cb26 Browse Files
21 May, 2019 1 commit
- Unify the boolean operators · 4eab1c68
  Ting PAN committed May 21, 2019
  
  4eab1c68 Browse Files
15 May, 2019 1 commit
- Apply the dispatcher to RunImpl · d1f714ea
  Ting PAN committed May 15, 2019
  
  d1f714ea