Archives Discussions

dertomas · ‎04-22-2013

Hi,

I have got a bandwidth question.

I got an A10-5800k Trinity APU and I use OpenCL with the integrated GPU Radeon HD 7660D.

Also, I use the Asus F2 A85-M Pro motherboard and 4x 8GB 1866 MHz ram. OS is Windows 7.

I used the AMD BufferBandwidth OpenCL example and ran it on my GPU.

There I got about 25 GB/s for reading. All fine.

Then I changed the BufferBandwidth code to use not nThreads (some calculated number) but maxThreads (maximal threads for the input data). I also changed the input data from 32 MB to 128 MB.

Now I get 70 to 80 GB/s for reading. This sounds somehow too much.

My first theoretical calculations were:

1.866 GT/s (Mem freq) * 256 bit (Radeon Memory Bus) * 2 (Dual Channel) = 111 GB/s

There, 70 to 80 GB/s sound plausible. But normal memory only has 64 bit wide buses. With Dual Channel, that would be only 128 bit meaning only 55 GB/s.

Can anyone clarify the used buses for me, please? Do you really have 80 GB/s on an integrated CPU or is there a bug in the AMD OpenCL BufferBandwidth test? Are there some resources to do some further reading?

Thanks a lot!

himanshu_gautam · ‎05-08-2013

Just profile the two kernels using CodeXL. IN default case nThreads=8192, and i get 0% cache hit. When I make nThreads=maxthreads (2197152), i see a cache hit of 94%, which results in that high read bandwidth throughput. sorry for the late reply here.

You can also run globalmemorybandwidth sample to check the uncached bandwidth for your device.

View solution in original post

Archives Discussions

Max Read Bandwidth of Trinity APU