Is there any information on when we can expect the performance of the following functions to be increased (significantly)?
(clEnqueueCopyBufferRect) <- didn't really test this one and it may be ok.
A short test, comparing a clEnqueueRead/WriteBuffer sequence with clEnqueueRead/WriteBufferRect transferring the same amount of memory show significant difference in transfer speed.
4096*4096*8 = 128MB upload and downloaded with non-rect takes about 7.5ms. This includes copying host memory to linear host buffer, uploading linear to device, running a kernel to copy linear device memory to destination array (strided).
The same array uploaded and downloaded with rect takes about 0.48 seconds. This is about 6.5 times slower and as it is, takes longer than the kernel operating on the data.