Global memory bandwidth ?

Discussion created by thomasco on Mar 10, 2011
Latest reply on Mar 12, 2011 by dmeiser

Hello everyone,

I played with OpenCL on a 5870 and got 118 GB/s of bandwidth doing a copy between 2 arrays in global memory.

118GB/s was the best result, using float4, with 32-bit floats it gave 98 GB/s.

The code is similar to the "float4 vs float1" code in the OpenCL programming guide, just moving one float4 per work item.


That's a bit low compared to the peak of 154 GB/s, that's only ~76 % I would have hoped to see something closer to 130 GB/s. Is this number typical ? I'm running on Linux. Does Windows give higher numbers ?

What can I expect with the latest cards, like the 6970 ?

Any idea how I can improve this number ?