Summary: This commit uses CopyMatrix to implement concat and split generically instead of specialized kernels.