When using clEnqueueCopyBuffer to transfer data between two buffers located on the same device, how does this operation performs when compared to transfers from device memory to host memory. That is, is this operation almost as heavy as clEnqueueReadBuffer, or significantly lighter?
I know it should be must faster than off-device copy. I only wanted to be sure. I've profiled an application that uses this API and it is quite lighter.