I feel reduction is faster for xdims rather than ydims. As I increase the value of ydims, the time grows higher than linearly. Can anybody explain this?
Never mind, I've found the answer. The host loop caused this.