- 29 Jun, 2021 1 commit
-
-
Summary: This commit adds a feature to map memory and load data with offset. Tensor takes mapped memory is readonly and will drop it on next mutation.
Ting PAN committed
-
- 25 Jun, 2021 1 commit
-
-
Summary: This commit adds extra CUDA softmax kernels using warp reduce. Warp reduce leads to better performance when dimension <= 256, which is preferred for the recent vision transformers.
Ting PAN committed
-
- 22 Jun, 2021 1 commit
-
-
Summary: This commit uses "math::Transpose" instead of "kernels::Transpose" for more possible optimized routines in the future.
Ting PAN committed
-
- 19 Jun, 2021 1 commit
-
-
Summary: This commit instantiates CUDA kernels by using constant dimensions to enable the optimization during compiler-time.
Ting PAN committed
-
- 08 Jun, 2021 1 commit
-
-
Summary: This commit allows transpose to compute in-place by leveraging buffer. We also adds CRD mode for space-depth transpose (i.e., pixel shuffle).
Ting PAN committed
-
- 31 May, 2021 1 commit
-
-
Summary: This commit removes the default cuBLAS tensor core math mode when CUDA >= 11.0 on ampere devices to avoid the FP32 downcast math.
Ting PAN committed
-
- 13 May, 2021 1 commit
-
-
Summary: This commit adds im2col operator to unfold input to depth.
Ting PAN committed
-
- 07 May, 2021 1 commit
-
-
Summary: This commit renames triangular operator to trilu following the ONNX.
Ting PAN committed
-
- 01 May, 2021 1 commit
-
-
Summary: This commit fixes the alpha and beta to 1/6 and 0.5 for hardswish, the same behavior as ONNX scheme.
Ting PAN committed
-
- 28 Apr, 2021 1 commit
-
-
Summary: This commit adds reverse or flip operator.
Ting PAN committed
-
- 21 Apr, 2021 1 commit
-
-
Summary: This commit adds GELU activation to compute output via approximate or naive mode.
Ting PAN committed
-
- 14 Apr, 2021 1 commit
-
-
Summary: This commit adds scatter and gather operator to remap elements along the given dimension of indices.
Ting PAN committed
-
- 08 Apr, 2021 1 commit
-
-
Summary: The new frontend makes an union of two execution modes, while starts from a single tensor class. Besides, it emits the operator execution through a common path that works both for dragon and torch.
Ting PAN committed
-
- 04 Feb, 2021 1 commit
-
-
Summary: This commit generalizes the fully-connected operation into GEMM, and enhances the matmul operation via batched Dot, GEMV and GEMM. New representations and attributes have been consistent with ONNX.
Ting PAN committed
-
- 25 Jan, 2021 1 commit
-
-
Summary: For the purpose of consistency on getting CUDNN convolution algorithms, CUDNN v6 (mainly relied by CUDA 8.0) is now dropped.
Ting PAN committed
-
- 20 Jan, 2021 1 commit
-
-
Summary: This commit adds the sysconfig module to get the build information. Build information is helpful to select tests or report issues.
Ting PAN committed
-
- 16 Jan, 2021 1 commit
-
-
Summary: This commit adds 1D and 3D support for vision operators via a generalized ND implementation.
Ting PAN committed
-
- 29 Dec, 2020 1 commit
-
-
Summary: This commit fixes the two compiling issues on win32: 1. Remove the constexpr in generating protobuf headers. 2. Enforce the "/MT" flag to overwrite the "/MD" always.
Ting PAN committed
-
- 23 Dec, 2020 1 commit
-
-
Summary: This commit tests the correctness of shape inference and data type blended by autograph module.
Ting PAN committed
-
- 15 Dec, 2020 1 commit
-
-
Summary: We found it unstable when defining CUB storages in a union. More surveys should be taken to understand this issue.
Ting PAN committed
-
- 11 Dec, 2020 1 commit
-
-
Summary: This commit moves the parser into ArgHelper which designed to add the descriptors before.
Ting PAN committed
-
- 10 Dec, 2020 1 commit
-
-
Summary: This commit feeds the repeated tensor arguments with the entire array instead of the piecewise scalars.
Ting PAN committed
-
- 09 Dec, 2020 1 commit
-
-
Summary: This commit redesigns the ``vm.onnx`` by referring the official repository. Frontends and backends are aligned with identical API for dragon, torch and tensorrt.
Ting PAN committed
-
- 03 Dec, 2020 1 commit
-
-
Summary: This commit removes the unused reference of GradientTape to avoid the circular reference issue.
Ting PAN committed
-
- 02 Dec, 2020 1 commit
-
-
Summary: This commit attaches input and output together in assign operators, which fixes the missing input defs due to identity from input to output.
Ting PAN committed
-
- 29 Nov, 2020 1 commit
-
-
Summary: This commit adds pseudo FP16 kernels with FP32 conversions for DepthwiseConv2d and SyncBN operator.
Ting PAN committed
-
- 05 Nov, 2020 1 commit
-
-
Summary: This commit adds a fallback with FP32 accumulator for FP16 ReduceSum to avoid dropping too many small values. Besides, FP16 kernels for arch < 530 are almost available.
Ting PAN committed
-
- 24 Oct, 2020 1 commit
-
-
Summary: This commit adds the hardsigmoid, hardswish and swish op with specialized kernel, there are widely used in MobileNetV3 and EfficientNet.
Ting PAN committed
-
- 20 Oct, 2020 1 commit
-
-
Summary: This commit uses CopyMatrix to implement concat and split generically instead of specialized kernels.
Ting PAN committed
-
- 14 Oct, 2020 1 commit
-
-
Summary: This commit uses unique tensors to provide workspace data to avoid the corruption between operator and kernel.
Ting PAN committed
-
- 13 Oct, 2020 1 commit
-
-
Summary: This commit adds the linspace op for dragon, torch and tensorflow. And, a workaround for truncated int interval is made to range/linspace (up to 2**57).
Ting PAN committed
-
- 08 Oct, 2020 1 commit
-
-
Summary: This commit reimplements the cuda argmax/argmin via BlockReduce, instead of the naive reduction in kernel loop.
Ting PAN committed
-
- 07 Oct, 2020 1 commit
-
-
Summary: This commit adds the sort op for dragon, torch and tensorflow. Besides, cuda implementation of topk op is now available.
Ting PAN committed
-
- 27 Sep, 2020 1 commit
-
-
Summary: This commit uses local(thread or stream) workspace for Context, which provides a more elegant way to dispatch kernels requiring scratch. Besides, TF32 math type is provided as a cuDNN option for Ampere device.
Ting PAN committed
-
- 10 Sep, 2020 1 commit
-
-
Summary: This commit adds the unique op for dragon, torch, tensorflow and onnx. Besides, fixes the bug that gets the wrong workspace size in cached cudnn convolution.
Ting PAN committed
-
- 05 Sep, 2020 1 commit
-
-
Summary: This commit reimplements the default shuffle policy of data reader with sequential sampling (be consistent with DALI) instead of chunk permutation (MXNet solution). Sequential sampling is tuned by argument ``initial_fill`` only, and works both for HDD and SSD.
Ting PAN committed
-
- 30 Aug, 2020 1 commit
-
-
Summary: This commit renames the operator argument getter to ``GetArgument`` whatever an argument is single or repeated.
Ting PAN committed
-
- 23 Aug, 2020 1 commit
-
-
Summary: This commit enforces the stream synchronization before dispatching NCCL collectives. Otherwise, data corruption will happen due to the default value of ``NCCL_GROUP_CUDA_STREAM`` changed to 0 since CUDA 9.2, i.e., no explicit event waiting for unfinished kernels.
Ting PAN committed
-
- 12 Aug, 2020 1 commit
-
-
Summary: This commit adds the ``memory_allocated`` API for ``dragon.Workspace`` to query the size of allocated memory (and optionally on a specified device).
Ting PAN committed
-
- 07 Aug, 2020 1 commit
-
-
Summary: This commit enforces the algorithm finding even if the backward of filter or data will not be executed. Otherwise, the empty algorithm will be encountered between two cached operation with the same arguments and input shape.
Ting PAN committed
-