cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

bubu
Adept II

What's faster: clEnqueueReadBuffer/clEnqueueMapBuffer ?

What's faster? To map a CL buffer and read from it or to call clEnqueueReadBuffer explicitly?

 

thx

0 Likes
8 Replies
n0thing
Journeyman III

I have observed that Mapping/Unmapping is faster  which is around ~2.5 GB/s compared to ~1.5 Gb/s for enqueue read/write.

 

0 Likes

Why it is not 6Gbps?

0 Likes

because current implementation do not use DMA.

i have seen some post where OpenCL PCIeBandwith test report 1,5GB/s and PCIeSpeedTest from CAL report only 800MB/s which use DMA.

from only 800MB/s i assume that user suffer DMA related problem on some motheboard. but OpenCL transfer was faster which lead me to that OpenCL currently do not use DMA.

0 Likes

Yeah, DMA can only operate on pinned memory for which the transfer rate is close to the peak i.e ~ 6GB/s. AMD doesn't have it enabled in the current release. Next release probably.

I also heard about some X58 chipset motherboards which have issues with PCIe transfer rates.

0 Likes

Yep, map/unmap seems to be faster for me too.

0 Likes
sambani
Journeyman III

Originally posted by: bubu What's faster? To map a CL buffer and read from it or to call clEnqueueReadBuffer explicitly?

 

 

 

thx

 

 

mate, check out this page,

 

click here

 

Yes, you're missing the CL_TRUE in the clEnqueueWriteBuffer call. This makes the write operation blocking, which stalls the CPU while the copy is made. Using the host pointer, the OpenCL implementation can "optimize" the copy by making it asynchronous, thus in overall the performance is better....

 

 

 

acne cream|best acne products|proactol

0 Likes

Originally posted by: sambani
Originally posted by: bubu What's faster? To map a CL buffer and read from it or to call clEnqueueReadBuffer explicitly?


 

 

 

mate, check out this page,

 

 

 

click here

 

 

 

Yes, you're missing the CL_TRUE in the clEnqueueWriteBuffer call. This makes the write operation blocking, which stalls the CPU while the copy is made. Using the host pointer, the OpenCL implementation can "optimize" the copy by making it asynchronous, thus in overall the performance is better....

 

 



If you make the copy blocking, then making the copy asynchronous would not help as you have to wait for the copy to finish anyway.

If you use clEnqueueRead/WriteBuffer with generic system memory as the host ptr, then the driver will have to pin that memory in order for the device to copy from/to it.  If you use clEnqueueMapBuffer, the driver can optimize the operation by using pre-pinned memory.

Jeff

0 Likes

Originally posted by: jeff_golds
Originally posted by: sambani
Originally posted by: bubu What's faster? To map a CL buffer and read from it or to call clEnqueueReadBuffer explicitly?


 

 

 

   

 

mate, check out this page,

 

 

 

   

 

click here

 

 

 

   

 

Yes, you're missing the CL_TRUE in the clEnqueueWriteBuffer call. This makes the write operation blocking, which stalls the CPU while the copy is made. Using the host pointer, the OpenCL implementation can "optimize" the copy by making it asynchronous, thus in overall the performance is better....

 

 

 

 



 

If you make the copy blocking, then making the copy asynchronous would not help as you have to wait for the copy to finish anyway.

 

If you use clEnqueueRead/WriteBuffer with generic system memory as the host ptr, then the driver will have to pin that memory in order for the device to copy from/to it.  If you use clEnqueueMapBuffer, the driver can optimize the operation by using pre-pinned memory.

 

Jeff

 

 

Thanks Jeff, dint think about it that what though....

 

 


hgh pills|triactol|painful intercourse|buy proactol

0 Likes