I have a rectangular data array (width x height) in host memory, and want to copy a region (left, top) - (right, bottom) from the array to GPU memory vice versa.
In CUDA I used cuMemcpy2D for doing this -- it was only 25% slower than copying a sequential memory block. Using a loop that copies every line to GPU is very inefficient: it is more than 5 times slower. It is also possible to first copy the rectangular region to a sequential memory block on CPU and then send the block to GPU using regular copy function, but this requires both extra CPU memory and time. So cuMemcpy2D is the best way to solve the task.
However, it seems that nothing similar to cuMemcpy2D exists in Open CL. Is it really the case or I'm missing something in Open CL specs?