Instantiate dispatch template by value for crucial CUDA kernels
Summary: This commit instantiates CUDA kernels by using constant dimensions to enable the optimization during compiler-time.
Showing
with
629 additions
and
355 deletions
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
This diff is collapsed.
Click to expand it.
-
Please register or sign in to post a comment