What's faster? To map a CL buffer and read from it or to call clEnqueueReadBuffer explicitly?
thx
I have observed that Mapping/Unmapping is faster which is around ~2.5 GB/s compared to ~1.5 Gb/s for enqueue read/write.
Why it is not 6Gbps?
because current implementation do not use DMA.
i have seen some post where OpenCL PCIeBandwith test report 1,5GB/s and PCIeSpeedTest from CAL report only 800MB/s which use DMA.
from only 800MB/s i assume that user suffer DMA related problem on some motheboard. but OpenCL transfer was faster which lead me to that OpenCL currently do not use DMA.
Yeah, DMA can only operate on pinned memory for which the transfer rate is close to the peak i.e ~ 6GB/s. AMD doesn't have it enabled in the current release. Next release probably.
I also heard about some X58 chipset motherboards which have issues with PCIe transfer rates.
Yep, map/unmap seems to be faster for me too.
Originally posted by: bubu What's faster? To map a CL buffer and read from it or to call clEnqueueReadBuffer explicitly?
thx
mate, check out this page,
Yes, you're missing the CL_TRUE in the clEnqueueWriteBuffer call. This makes the write operation blocking, which stalls the CPU while the copy is made. Using the host pointer, the OpenCL implementation can "optimize" the copy by making it asynchronous, thus in overall the performance is better....
Originally posted by: sambani Originally posted by: bubu What's faster? To map a CL buffer and read from it or to call clEnqueueReadBuffer explicitly?
mate, check out this page,
Yes, you're missing the CL_TRUE in the clEnqueueWriteBuffer call. This makes the write operation blocking, which stalls the CPU while the copy is made. Using the host pointer, the OpenCL implementation can "optimize" the copy by making it asynchronous, thus in overall the performance is better....
If you make the copy blocking, then making the copy asynchronous would not help as you have to wait for the copy to finish anyway.
If you use clEnqueueRead/WriteBuffer with generic system memory as the host ptr, then the driver will have to pin that memory in order for the device to copy from/to it. If you use clEnqueueMapBuffer, the driver can optimize the operation by using pre-pinned memory.
Jeff
Originally posted by: jeff_golds Originally posted by: sambaniOriginally posted by: bubu What's faster? To map a CL buffer and read from it or to call clEnqueueReadBuffer explicitly?
mate, check out this page,
Yes, you're missing the CL_TRUE in the clEnqueueWriteBuffer call. This makes the write operation blocking, which stalls the CPU while the copy is made. Using the host pointer, the OpenCL implementation can "optimize" the copy by making it asynchronous, thus in overall the performance is better....
If you make the copy blocking, then making the copy asynchronous would not help as you have to wait for the copy to finish anyway.
If you use clEnqueueRead/WriteBuffer with generic system memory as the host ptr, then the driver will have to pin that memory in order for the device to copy from/to it. If you use clEnqueueMapBuffer, the driver can optimize the operation by using pre-pinned memory.
Jeff
Thanks Jeff, dint think about it that what though....