cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

cadorino
Journeyman III
Journeyman III

Re: Cross-device bandwidth for discrete GPU (HD 5870)

I've spent some time testing the same memory bandwidth program with the same gpu (HD 5870) on a different board and processore (intel i7). Unfortunately I get the same bandwidth results

0 Kudos
Reply
cadorino
Journeyman III
Journeyman III

Re: Cross-device bandwidth for discrete GPU (HD 5870)

Any news? Maybe the high bandwidth of the discrete card is due to some caching inside the device or in the command queue. In this case, how can I avoid it?

0 Kudos
Reply
siu
Staff
Staff

Re: Cross-device bandwidth for discrete GPU (HD 5870)

The access pattern (neighboring work items reading from overlapping memory regions) indicates that most of reads are probably hitting the data cache.  That could explain why the bandwidth is high because it isn't measuring the PCI-E bandwidth. 

Have you looked at the BufferBandwidth sample in the SDK?  If you run that sample program with the input buffer set to ALLOC_HOST_PTR | READ_ONLY, the bandwidth of "GPU kernel read" is probably similar to what you are trying achieve with your test program.

0 Kudos
Reply
cadorino
Journeyman III
Journeyman III

Re: Cross-device bandwidth for discrete GPU (HD 5870)

Hi. The problem is that I get very similar results by accessing "randomly" to the host or device memory. Unfortunately, rabndomly means using a static offset, so a smart compiler could optimize prefetching also in this case. Is there any trick to measure the real bandwidth in transferring data from the host to the HD 5870?

0 Kudos
Reply
siu
Staff
Staff

Re: Cross-device bandwidth for discrete GPU (HD 5870)

To measure the PCIe bandwidth, you can simply time clEnqueueReadBuffer and clEnqueueWriteBuffer. 

Try running the BufferBandwidth sample with the -pcie flag and it will show the PCIe bandwidth for each direction.  If you refer to the source code, you'll see that it's actually timing the Read/Write buffer.

0 Kudos
Reply
jeff_golds
Staff
Staff

Re: Cross-device bandwidth for discrete GPU (HD 5870)

If you use clEnqueueReadBuffer and clEnqueueWriteBuffer, you will pay the price for pinning on each transfer unless you use the prepinned path as documented in the APP SDK documentation.  This is also demonstrated in the BufferBandwidth sample code.

0 Kudos
Reply
cadorino
Journeyman III
Journeyman III

Re: Cross-device bandwidth for discrete GPU (HD 5870)

Hey, thank you for the answers!

I tested the BufferBandwidth sample.

With these arguments I obtain respectively 28GB/s and 17GB/s for the HD 5870 and the integrated GPU. The bandwidth of the discrete card is lower than the bandwidth of the integrated GPU only for very few GPU wavefronts (nw).

C:\Users\gabriele\Downloads\BufferBandwidth\BufferBandwidth\samples\opencl\bin\x86> .\BufferBandwidth.exe -d 0 -if 5

-of 5 -nwk 1 -nr 5 -nl 5 -nw 8192

Probably I'm wrong in thinking that it should be straightforward to verify the higher bandwidth of an integrated GPU...

0 Kudos
Reply