- 20 Dec, 2021 1 commit
-
-
Ting PAN committed
-
- 29 Jun, 2021 1 commit
-
-
Summary: This commit adds a feature to map memory and load data with offset. Tensor takes mapped memory is readonly and will drop it on next mutation.
Ting PAN committed
-
- 25 Jun, 2021 1 commit
-
-
Summary: This commit adds extra CUDA softmax kernels using warp reduce. Warp reduce leads to better performance when dimension <= 256, which is preferred for the recent vision transformers.
Ting PAN committed
-
- 22 Jun, 2021 1 commit
-
-
Summary: This commit uses "math::Transpose" instead of "kernels::Transpose" for more possible optimized routines in the future.
Ting PAN committed
-
- 19 Jun, 2021 1 commit
-
-
Summary: This commit instantiates CUDA kernels by using constant dimensions to enable the optimization during compiler-time.
Ting PAN committed
-
- 08 Jun, 2021 1 commit
-
-
Summary: This commit allows transpose to compute in-place by leveraging buffer. We also adds CRD mode for space-depth transpose (i.e., pixel shuffle).
Ting PAN committed
-
- 31 May, 2021 1 commit
-
-
Summary: This commit removes the default cuBLAS tensor core math mode when CUDA >= 11.0 on ampere devices to avoid the FP32 downcast math.
Ting PAN committed
-
- 13 May, 2021 1 commit
-
-
Summary: This commit adds im2col operator to unfold input to depth.
Ting PAN committed
-
- 07 May, 2021 1 commit
-
-
Summary: This commit renames triangular operator to trilu following the ONNX.
Ting PAN committed
-
- 01 May, 2021 1 commit
-
-
Summary: This commit fixes the alpha and beta to 1/6 and 0.5 for hardswish, the same behavior as ONNX scheme.
Ting PAN committed
-
- 28 Apr, 2021 1 commit
-
-
Summary: This commit adds reverse or flip operator.
Ting PAN committed
-
- 21 Apr, 2021 1 commit
-
-
Summary: This commit adds GELU activation to compute output via approximate or naive mode.
Ting PAN committed
-
- 14 Apr, 2021 1 commit
-
-
Summary: This commit adds scatter and gather operator to remap elements along the given dimension of indices.
Ting PAN committed
-
- 08 Apr, 2021 1 commit
-
-
Summary: The new frontend makes an union of two execution modes, while starts from a single tensor class. Besides, it emits the operator execution through a common path that works both for dragon and torch.
Ting PAN committed
-
- 04 Feb, 2021 1 commit
-
-
Summary: This commit generalizes the fully-connected operation into GEMM, and enhances the matmul operation via batched Dot, GEMV and GEMM. New representations and attributes have been consistent with ONNX.
Ting PAN committed
-
- 25 Jan, 2021 1 commit
-
-
Summary: For the purpose of consistency on getting CUDNN convolution algorithms, CUDNN v6 (mainly relied by CUDA 8.0) is now dropped.
Ting PAN committed
-
- 20 Jan, 2021 1 commit
-
-
Summary: This commit adds the sysconfig module to get the build information. Build information is helpful to select tests or report issues.
Ting PAN committed
-
- 16 Jan, 2021 1 commit
-
-
Summary: This commit adds 1D and 3D support for vision operators via a generalized ND implementation.
Ting PAN committed
-
- 29 Dec, 2020 1 commit
-
-
Summary: This commit fixes the two compiling issues on win32: 1. Remove the constexpr in generating protobuf headers. 2. Enforce the "/MT" flag to overwrite the "/MD" always.
Ting PAN committed
-
- 23 Dec, 2020 1 commit
-
-
Summary: This commit tests the correctness of shape inference and data type blended by autograph module.
Ting PAN committed
-
- 15 Dec, 2020 1 commit
-
-
Summary: We found it unstable when defining CUB storages in a union. More surveys should be taken to understand this issue.
Ting PAN committed
-
- 11 Dec, 2020 1 commit
-
-
Summary: This commit moves the parser into ArgHelper which designed to add the descriptors before.
Ting PAN committed
-
- 10 Dec, 2020 1 commit
-
-
Summary: This commit feeds the repeated tensor arguments with the entire array instead of the piecewise scalars.
Ting PAN committed
-
- 09 Dec, 2020 1 commit
-
-
Summary: This commit redesigns the ``vm.onnx`` by referring the official repository. Frontends and backends are aligned with identical API for dragon, torch and tensorrt.
Ting PAN committed
-
- 03 Dec, 2020 1 commit
-
-
Summary: This commit removes the unused reference of GradientTape to avoid the circular reference issue.
Ting PAN committed
-
- 02 Dec, 2020 1 commit
-
-
Summary: This commit attaches input and output together in assign operators, which fixes the missing input defs due to identity from input to output.
Ting PAN committed
-
- 29 Nov, 2020 1 commit
-
-
Summary: This commit adds pseudo FP16 kernels with FP32 conversions for DepthwiseConv2d and SyncBN operator.
Ting PAN committed
-
- 05 Nov, 2020 1 commit
-
-
Summary: This commit adds a fallback with FP32 accumulator for FP16 ReduceSum to avoid dropping too many small values. Besides, FP16 kernels for arch < 530 are almost available.
Ting PAN committed
-
- 24 Oct, 2020 1 commit
-
-
Summary: This commit adds the hardsigmoid, hardswish and swish op with specialized kernel, there are widely used in MobileNetV3 and EfficientNet.
Ting PAN committed
-
- 20 Oct, 2020 1 commit
-
-
Summary: This commit uses CopyMatrix to implement concat and split generically instead of specialized kernels.
Ting PAN committed
-
- 14 Oct, 2020 1 commit
-
-
Summary: This commit uses unique tensors to provide workspace data to avoid the corruption between operator and kernel.
Ting PAN committed
-
- 13 Oct, 2020 1 commit
-
-
Summary: This commit adds the linspace op for dragon, torch and tensorflow. And, a workaround for truncated int interval is made to range/linspace (up to 2**57).
Ting PAN committed
-
- 08 Oct, 2020 1 commit
-
-
Summary: This commit reimplements the cuda argmax/argmin via BlockReduce, instead of the naive reduction in kernel loop.
Ting PAN committed
-
- 07 Oct, 2020 1 commit
-
-
Summary: This commit adds the sort op for dragon, torch and tensorflow. Besides, cuda implementation of topk op is now available.
Ting PAN committed
-
- 27 Sep, 2020 1 commit
-
-
Summary: This commit uses local(thread or stream) workspace for Context, which provides a more elegant way to dispatch kernels requiring scratch. Besides, TF32 math type is provided as a cuDNN option for Ampere device.
Ting PAN committed
-
- 10 Sep, 2020 1 commit
-
-
Summary: This commit adds the unique op for dragon, torch, tensorflow and onnx. Besides, fixes the bug that gets the wrong workspace size in cached cudnn convolution.
Ting PAN committed
-
- 05 Sep, 2020 1 commit
-
-
Summary: This commit reimplements the default shuffle policy of data reader with sequential sampling (be consistent with DALI) instead of chunk permutation (MXNet solution). Sequential sampling is tuned by argument ``initial_fill`` only, and works both for HDD and SSD.
Ting PAN committed
-
- 30 Aug, 2020 1 commit
-
-
Summary: This commit renames the operator argument getter to ``GetArgument`` whatever an argument is single or repeated.
Ting PAN committed
-
- 23 Aug, 2020 1 commit
-
-
Summary: This commit enforces the stream synchronization before dispatching NCCL collectives. Otherwise, data corruption will happen due to the default value of ``NCCL_GROUP_CUDA_STREAM`` changed to 0 since CUDA 9.2, i.e., no explicit event waiting for unfinished kernels.
Ting PAN committed
-
- 12 Aug, 2020 1 commit
-
-
Summary: This commit adds the ``memory_allocated`` API for ``dragon.Workspace`` to query the size of allocated memory (and optionally on a specified device).
Ting PAN committed
-