Summary: This commit instantiates CUDA kernels by using constant dimensions to enable the optimization during compiler-time.