Question: In case it's our code, is there a sample program with source that allows us to reliably test the processing bandwidth of multi-GPU at once
We are having issues with 4 X 7970 setup (same with 3)
We dump some data on 4 cards, execute a loop and export timings (no PCIe - domain transfers included in perf timings), we have a thread per card and all is totally independent (what happens on a card stays on a card)
Result: One of the 4 cards run at full speed, the other ones about half speed (all openCL stuff)
Full Speed is the speed of 1 card with the other unplugged.
The card that is faster is the one the monitor is connected to (we don't push any pixels out though, it's all command line)
We test that by replugging the monitor cable.
If we unplug all the cards except one (anyone) then that card is full speed.
This is using the latest beta driver and profiles (which somehow seemed to have given us a tiny speedup).
This is a config with a single SB extreme 6 cores. I understand I couldn't get full PCIe transfer speed under such condition but I am wondering if something somewhere (i.e. driver) makes some assumptions as it's a 2X scaled down in terms of processing speed. (Many other thing tested, this is the short form).