Instantiate dispatch template by value for crucial CUDA kernels
Summary: This commit instantiates CUDA kernels by using constant dimensions to enable the optimization during compiler-time.
Showing
with
1630 additions
and
1245 deletions
-
Please register or sign in to post a comment