When using clEnqueueCopyBuffer to transfer data between two buffers located on the same device, how does this operation performs when compared to transfers from device memory to host memory. That is, is this operation almost as heavy as clEnqueueReadBuffer, or significantly lighter?
It's much faster than an off-device copy, since it doesn't you know, have to go off-device.
I.e. it should run at some good proportion of the global memory bandwidth, not the PCIe one.
I know it should be must faster than off-device copy. I only wanted to be sure. I've profiled an application that uses this API and it is quite lighter.
Thanks for the reply 😃