Summary: This commit removes the default cuBLAS tensor core math mode when CUDA >= 11.0 on ampere devices to avoid the FP32 downcast math.