Optimize training update operators
Summary: This commit fuses the weight decay and mixed precision conversion into update kernels to get lower training latency.
Showing
with
2271 additions
and
1959 deletions
docs/api/python/dragon/nn/channel_norm.rst
0 → 100644
docs/api/python/dragon/nn/lp_norm.rst
0 → 100644
docs/api/python/torch/backends/cuda.rst
0 → 100644
File moved
File moved
dragon/operators/array/shuffle_op.cc
0 → 100644
dragon/utils/math/transform.cc
0 → 100644
dragon/utils/math/transform.cu
0 → 100644
dragon/utils/math/transform.h
0 → 100644
torch/core/backends/cuda.py
0 → 100644
-
Please register or sign in to post a comment