- 29 Nov, 2020 1 commit
-
-
Summary: This commit adds pseudo FP16 kernels with FP32 conversions for DepthwiseConv2d and SyncBN operator.
Ting PAN committed
-
- 05 Nov, 2020 1 commit
-
-
Summary: This commit adds a fallback with FP32 accumulator for FP16 ReduceSum to avoid dropping too many small values. Besides, FP16 kernels for arch < 530 are almost available.
Ting PAN committed
-
- 24 Oct, 2020 1 commit
-
-
Summary: This commit adds the hardsigmoid, hardswish and swish op with specialized kernel, there are widely used in MobileNetV3 and EfficientNet.
Ting PAN committed
-
- 20 Oct, 2020 1 commit
-
-
Summary: This commit uses CopyMatrix to implement concat and split generically instead of specialized kernels.
Ting PAN committed
-
- 14 Oct, 2020 1 commit
-
-
Summary: This commit uses unique tensors to provide workspace data to avoid the corruption between operator and kernel.
Ting PAN committed
-
- 13 Oct, 2020 1 commit
-
-
Summary: This commit adds the linspace op for dragon, torch and tensorflow. And, a workaround for truncated int interval is made to range/linspace (up to 2**57).
Ting PAN committed
-
- 08 Oct, 2020 1 commit
-
-
Summary: This commit reimplements the cuda argmax/argmin via BlockReduce, instead of the naive reduction in kernel loop.
Ting PAN committed
-
- 07 Oct, 2020 1 commit
-
-
Summary: This commit adds the sort op for dragon, torch and tensorflow. Besides, cuda implementation of topk op is now available.
Ting PAN committed
-
- 27 Sep, 2020 1 commit
-
-
Summary: This commit uses local(thread or stream) workspace for Context, which provides a more elegant way to dispatch kernels requiring scratch. Besides, TF32 math type is provided as a cuDNN option for Ampere device.
Ting PAN committed
-
- 10 Sep, 2020 1 commit
-
-
Summary: This commit adds the unique op for dragon, torch, tensorflow and onnx. Besides, fixes the bug that gets the wrong workspace size in cached cudnn convolution.
Ting PAN committed
-
- 05 Sep, 2020 1 commit
-
-
Summary: This commit reimplements the default shuffle policy of data reader with sequential sampling (be consistent with DALI) instead of chunk permutation (MXNet solution). Sequential sampling is tuned by argument ``initial_fill`` only, and works both for HDD and SSD.
Ting PAN committed
-
- 30 Aug, 2020 1 commit
-
-
Summary: This commit renames the operator argument getter to ``GetArgument`` whatever an argument is single or repeated.
Ting PAN committed
-
- 23 Aug, 2020 1 commit
-
-
Summary: This commit enforces the stream synchronization before dispatching NCCL collectives. Otherwise, data corruption will happen due to the default value of ``NCCL_GROUP_CUDA_STREAM`` changed to 0 since CUDA 9.2, i.e., no explicit event waiting for unfinished kernels.
Ting PAN committed
-
- 12 Aug, 2020 1 commit
-
-
Summary: This commit adds the ``memory_allocated`` API for ``dragon.Workspace`` to query the size of allocated memory (and optionally on a specified device).
Ting PAN committed
-
- 07 Aug, 2020 1 commit
-
-
Summary: This commit enforces the algorithm finding even if the backward of filter or data will not be executed. Otherwise, the empty algorithm will be encountered between two cached operation with the same arguments and input shape.
Ting PAN committed
-
- 05 Aug, 2020 1 commit
-
-
Summary: This commit removes the deprecated API for DALI 0.24. Besides, variable length keyword arguments are added for forward compatibility.
Ting PAN committed
-
- 03 Aug, 2020 1 commit
-
-
Summary: This commit fixes the issue on building with CUDA11 and CUDNN8. Besides, C++14 is enabled by default instead of C++11 to support CUB 1.9+, and for this reason, the compiler is required to be gcc5/clang6/msvc141 or higher.
Ting PAN committed
-
- 30 Jul, 2020 1 commit
-
-
Summary: This commit uses sphinx to generate C++ API documentation whose style and theme are consistent with the Python API.
Ting PAN committed
-
- 25 Jul, 2020 1 commit
-
-
Summary: This commit computes the correct device for an existing output tensor.
Ting PAN committed
-
- 24 Jul, 2020 1 commit
-
-
Summary: This commit enforces all the vm packages to take api/core structure to adapt to the more complex future developments.
Ting PAN committed
-
- 16 Jul, 2020 1 commit
-
-
Summary: This commit correctly passes the extra arguments when executing a compiled graph function.
Ting PAN committed
-
- 15 Jul, 2020 1 commit
-
-
Summary: This commit enforces the shape and data type inherited from the same metaclass, which ensures the consistency between different styles.
Ting PAN committed
-
- 14 Jul, 2020 1 commit
-
-
Summary: This commit normalizes the inconsistent math notations in docstring.
Ting PAN committed
-
- 13 Jul, 2020 2 commits
- 11 Jul, 2020 1 commit
-
-
Summary: This commit moves the workspace api into the current workspace instance. For this reason, the namespace ``dragon.workspace`` is removed for simplicity.
Ting PAN committed
-
- 06 Jul, 2020 1 commit
-
-
Summary: This commit tests the executing of native ops and verifies the results. Several bugs are found and fixed according to these tests.
Ting PAN committed
-
- 22 Jun, 2020 2 commits
-
-
Summary: This commit removes the redundant workspace reference when executing a tensor operation.
Ting PAN committed -
Summary: We forgot to handle the inplace case that generated the random elements on the output(i.e. the input). Besides, this commit also fixes the omitted `RunOnDevice` for cudnn activations, which will rightly dispatches the implementation.
Ting PAN committed
-
- 17 Jun, 2020 1 commit
-
-
Ting PAN committed
-
- 31 May, 2019 1 commit
-
-
Ting PAN committed
-
- 26 May, 2019 1 commit
-
-
Ting PAN committed
-
- 21 May, 2019 1 commit
-
-
Ting PAN committed
-
- 15 May, 2019 1 commit
-
-
Ting PAN committed
-
- 14 May, 2019 2 commits
- 16 Apr, 2019 1 commit
-
-
Ting PAN committed
-
- 11 Apr, 2019 1 commit
-
-
Ting PAN committed
-
- 04 Apr, 2019 1 commit
-
-
Ting PAN committed
-
- 20 Mar, 2019 1 commit
-
-
Ting PAN committed
-