Optimize training update operators
Summary: This commit fuses the weight decay and mixed precision conversion into update kernels to get lower training latency.
Showing
with
1659 additions
and
1282 deletions
docs/api/python/dragon/nn/channel_norm.rst
0 → 100644
docs/api/python/dragon/nn/lp_norm.rst
0 → 100644
docs/api/python/torch/backends/cuda.rst
0 → 100644
File moved
File moved
dragon/operators/array/shuffle_op.cc
0 → 100644
This diff is collapsed.
Click to expand it.
dragon/utils/math/transform.cc
0 → 100644
dragon/utils/math/transform.cu
0 → 100644
dragon/utils/math/transform.h
0 → 100644
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
torch/core/backends/cuda.py
0 → 100644
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
-
Please register or sign in to post a comment