Instantiate dispatch template by value for crucial CUDA kernels
Summary: This commit instantiates CUDA kernels by using constant dimensions to enable the optimization during compiler-time.
Showing
with
667 additions
and
343 deletions
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
-
Please register or sign in to post a comment