Commits · 43a82e771e6ce6d86c39122bb2c9bc28a1675c92 · SeetaResearch / Dragon

14 Apr, 2021 1 commit

Add scatter-gather elements operator · 43a82e77

Summary:
This commit adds scatter and gather operator
to remap elements along the given dimension of indices.

committed Apr 14, 2021

43a82e77

08 Apr, 2021 1 commit

Update with the new frontend API · f431756f

Summary:
The new frontend makes an union of two execution modes, while starts from
a single tensor class. Besides, it emits the operator execution through
a common path that works both for dragon and torch.

committed Apr 09, 2021

f431756f

04 Feb, 2021 1 commit

Reimplement the general matrix multiplication · 6bfe3e73

Summary:
This commit generalizes the fully-connected operation into GEMM,
and enhances the matmul operation via batched Dot, GEMV and GEMM.
New representations and attributes have been consistent with ONNX.

committed Feb 05, 2021

6bfe3e73

25 Jan, 2021 1 commit

Remove support for CUDNN v6 · 73ed1b96

Summary:
For the purpose of consistency on getting CUDNN convolution algorithms,
CUDNN v6 (mainly relied by CUDA 8.0) is now dropped.

committed Jan 26, 2021

73ed1b96 Browse Directory

20 Jan, 2021 1 commit

Add sysconfig module · bbfecf22

Summary:
This commit adds the sysconfig module to get the build information.
Build information is helpful to select tests or report issues.

committed Jan 20, 2021

bbfecf22 Browse Directory

16 Jan, 2021 1 commit

Refactor vision operators · 2c90589f

Summary:
This commit adds 1D and 3D support for vision operators
via a generalized ND implementation.

committed Jan 17, 2021

2c90589f

29 Dec, 2020 1 commit

Fix the extension compiling issues on win32 · 60e5d25a

Summary:
This commit fixes the two compiling issues on win32:
1. Remove the constexpr in generating protobuf headers.
2. Enforce the "/MT" flag to overwrite the "/MD" always.

committed Dec 29, 2020

60e5d25a Browse Files

23 Dec, 2020 1 commit

Add tests of operator spec for AutoGraph · 1ad360e9

Summary:
This commit tests the correctness of shape inference and data type
blended by autograph module.

committed Dec 24, 2020

1ad360e9

15 Dec, 2020 1 commit

Fix a numerical issue by breaking the union of CUB storages · 1bd78a3c

Summary:
We found it unstable when defining CUB storages in a union.
More surveys should be taken to understand this issue.

committed Dec 16, 2020

1bd78a3c

11 Dec, 2020 1 commit

Simplify the parsing of tensor arguments · ad83f4e4

Summary:
This commit moves the parser into ArgHelper which designed
to add the descriptors before.

committed Dec 12, 2020

ad83f4e4

10 Dec, 2020 1 commit

Simplify the feeding to repeated tensor arguments · 9fc5249b

Summary:
This commit feeds the repeated tensor arguments with
the entire array instead of the piecewise scalars.

committed Dec 11, 2020

9fc5249b Browse Files

09 Dec, 2020 1 commit

Refactor ONNX frontends and backends · b93bde0d

Summary:
This commit redesigns the ``vm.onnx`` by referring the official repository.
Frontends and backends are aligned with identical API for dragon, torch and tensorrt.

committed Dec 09, 2020

b93bde0d

03 Dec, 2020 1 commit
- Fix the circular reference of GradientTape · e82d2ba4
```
Summary:
This commit removes the unused reference of GradientTape to
avoid the circular reference issue.
```
  Ting PAN committed Dec 03, 2020
  e82d2ba4 Browse Files
02 Dec, 2020 1 commit

Fix the bug of missing defs on blending assign operators · 9de0f1a3

Summary:

This commit attaches input and output together in assign operators,
which fixes the missing input defs due to identity from input to output.

committed Dec 02, 2020

9de0f1a3 Browse Files

29 Nov, 2020 1 commit

Add FP16 support for DepthwiseConv2d && SyncBN Operator · 746f2cbb

Summary:
This commit adds pseudo FP16 kernels with FP32 conversions
for DepthwiseConv2d and SyncBN operator.

committed Nov 30, 2020

746f2cbb

05 Nov, 2020 1 commit

Use FP32 accumulator for FP16 ReduceSum · d56e67d1

Summary:
This commit adds a fallback with FP32 accumulator
for FP16 ReduceSum to avoid dropping too many small values.
Besides, FP16 kernels for arch < 530 are almost available.

committed Nov 05, 2020

d56e67d1

24 Oct, 2020 1 commit

Add HardSigmoid && HardSwish && Swish Operator · 9ca4b60f

Summary:
This commit adds the hardsigmoid, hardswish and swish op with specialized kernel,
there are widely used in MobileNetV3 and EfficientNet.

committed Oct 25, 2020

9ca4b60f

20 Oct, 2020 1 commit

Optimize Concat && Split Operator · f76c693e

Summary:
This commit uses CopyMatrix to implement concat and split generically
instead of specialized kernels.

committed Oct 20, 2020

f76c693e

14 Oct, 2020 1 commit

Fix bug of sharing the corrupted workspace data · 77dcd71d

Summary:
This commit uses unique tensors to provide workspace data
to avoid the corruption between operator and kernel.

committed Oct 15, 2020

77dcd71d Browse Directory

13 Oct, 2020 1 commit

Add LinSpace Operator · e83c407a

Summary:
This commit adds the linspace op for dragon, torch and tensorflow.
And, a workaround for truncated int interval is made to range/linspace (up to 2**57).

committed Oct 13, 2020

e83c407a

08 Oct, 2020 1 commit

Use block reduction for ArgMax and ArgMin Operator · 5cbbef4b

Summary:
This commit reimplements the cuda argmax/argmin via BlockReduce,
instead of the naive reduction in kernel loop.

committed Oct 08, 2020

5cbbef4b

07 Oct, 2020 1 commit

Add Sort Operator · b4019faa

Summary:
This commit adds the sort op for dragon, torch and tensorflow.
Besides, cuda implementation of topk op is now available.

committed Oct 07, 2020

b4019faa

27 Sep, 2020 1 commit

Use local workspace for Context · fdf26ef2

Summary:
This commit uses local(thread or stream) workspace for Context,
which provides a more elegant way to dispatch kernels requiring scratch.
Besides, TF32 math type is provided as a cuDNN option for Ampere device.

committed Sep 27, 2020

fdf26ef2

10 Sep, 2020 1 commit

Add Unique Operator · 1dd8aeef

Summary:
This commit adds the unique op for dragon, torch, tensorflow and onnx.
Besides, fixes the bug that gets the wrong workspace size in cached cudnn convolution.

committed Sep 11, 2020

1dd8aeef

05 Sep, 2020 1 commit

Use sequential sampling as the default shuffle policy · 80267d8f

Summary:
This commit reimplements the default shuffle policy of data reader with
sequential sampling (be consistent with DALI) instead of chunk permutation (MXNet solution).
Sequential sampling is tuned by argument ``initial_fill`` only, and works both for HDD and SSD.

committed Sep 05, 2020

80267d8f Browse Directory

30 Aug, 2020 1 commit

Normalize the getter of operator argument · cca00c0d

Summary:
This commit renames the operator argument getter to ``GetArgument``
whatever an argument is single or repeated.

committed Aug 30, 2020

cca00c0d

23 Aug, 2020 1 commit

Fix the stream issue with NCCL2 on CUDA 9.2 and later · 58708021

Summary:
This commit enforces the stream synchronization before dispatching NCCL collectives.
Otherwise, data corruption will happen due to the default value of ``NCCL_GROUP_CUDA_STREAM``
changed to 0 since CUDA 9.2, i.e., no explicit event waiting for unfinished kernels.

committed Aug 24, 2020

58708021

12 Aug, 2020 1 commit

Add support to query the size for the allocated memory · 58c5371e

Summary:
This commit adds the ``memory_allocated`` API for ``dragon.Workspace``
to query the size of allocated memory (and optionally on a specified device).

committed Aug 13, 2020

58c5371e Browse Directory

07 Aug, 2020 1 commit

Fix the skipped algorithm finding in cached CUDNN convolution · a7a7e4fc

Summary:
This commit enforces the algorithm finding even if the backward of filter or data
will not be executed. Otherwise, the empty algorithm will be encountered between
two cached operation with the same arguments and input shape.

committed Aug 08, 2020

a7a7e4fc

05 Aug, 2020 1 commit

Remove the deprecated DALI API · 218796ed

Summary:
This commit removes the deprecated API for DALI 0.24.
Besides, variable length keyword arguments are added for forward compatibility.

committed Aug 06, 2020

218796ed

03 Aug, 2020 1 commit

Add support for building Ampere GPU, CUDA11 and CUDNN8 · c40eaf7b

Summary:
This commit fixes the issue on building with CUDA11 and CUDNN8.
Besides, C++14 is enabled by default instead of C++11 to support CUB 1.9+,
and for this reason, the compiler is required to be gcc5/clang6/msvc141 or higher.

committed Aug 03, 2020

c40eaf7b

30 Jul, 2020 1 commit

Init sphinx documentation for C++ API · d8f612c8

Summary:
This commit uses sphinx to generate C++ API documentation
whose style and theme are consistent with the Python API.

committed Jul 31, 2020

d8f612c8

25 Jul, 2020 1 commit
- Fix the device inference in eager execution · 8dbb73a7
```
Summary:
This commit computes the correct device for an existing output tensor.
```
  Ting PAN committed Jul 26, 2020
  8dbb73a7 Browse Files
24 Jul, 2020 1 commit

Normalize the API view of vm packages · c0c43218

Summary:
This commit enforces all the vm packages to take api/core structure
to adapt to the more complex future developments.

committed Jul 24, 2020

c0c43218

16 Jul, 2020 1 commit
- Fix the missing extra arguments in compiled function · ae4d6834
```
Summary:
This commit correctly passes the extra arguments
when executing a compiled graph function.
```
  Ting PAN committed Jul 17, 2020
  ae4d6834
15 Jul, 2020 1 commit

Make the consistency for shape and data type · aa2ec8c3

Summary:
This commit enforces the shape and data type inherited from the same metaclass,
which ensures the consistency between different styles.

committed Jul 16, 2020

aa2ec8c3

14 Jul, 2020 1 commit
- Normalize the math notations in docstring · 0ab14f30
```
Summary:
This commit normalizes the inconsistent math notations in docstring.
```
  Ting PAN committed Jul 15, 2020
  0ab14f30 Browse Files
13 Jul, 2020 2 commits

Align the tensor abstraction · 2598f4dc
```
Summary:
This commit aligns the properties and methods of tensor class.
```
Ting PAN committed Jul 14, 2020
2598f4dc Browse Directory

Add the explicit clear method for workspace · 413dbad0

Summary:
This commit adds the clear method to free resources manually
if the workspace instance is referenced circularly.

committed Jul 13, 2020

413dbad0

11 Jul, 2020 1 commit

Remove the duplicate workspace singletons · 02ad90d5

Summary:
This commit moves the workspace api into the current workspace instance.
For this reason, the namespace ``dragon.workspace`` is removed for simplicity.

committed Jul 12, 2020

02ad90d5 Browse Files