cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

revisionfx
Journeyman III

Quad GPU question (4 X 7970)

Question: In case it's our code, is there a sample program with source that allows us to reliably test the processing bandwidth of multi-GPU at once

Context:

We are having issues with 4 X 7970 setup (same with 3)

We dump some data on 4 cards, execute a loop and export timings (no PCIe - domain transfers included in perf timings), we have a thread per card and all is totally independent (what happens on a card stays on a card)

Result: One of the 4 cards run at full speed, the other ones about half speed (all openCL stuff)

Full Speed is the speed of 1 card with the other unplugged.

The card that is faster is the one the monitor is connected to (we don't push any pixels out though, it's all command line)

We test that by replugging the monitor cable.

If we unplug all the cards except one (anyone) then that card is full speed.

This is using the latest beta driver and profiles (which somehow seemed to have given us a tiny speedup).

This is a config with a single SB extreme 6 cores. I understand I couldn't get full PCIe transfer speed under such condition but I am wondering if something somewhere (i.e. driver) makes some assumptions as it's a 2X scaled down in terms of processing speed. (Many other thing tested, this is the short form).

Pierre

0 Likes
4 Replies
drallan
Challenger

Hi. Not sure but it sounds like the headless clock problem in this thread: http://devgurus.amd.com/thread/159062

All display drivers since about 8.96 have this problem where headless 7970 GPU's are stuck at a clock speed of 500 MHz.

No setting or tweaking tool can fix this, if it can the clock immediately reverts back to 500 MHz.

Thus, the drivers are not very useful for multi-GPU work.

I'm still using version 8.92.

All versions of 8.95 do not have the headless clock problem, but I had other issues with them.

Someone here suggested a temporary fix  using dummy VGA dongles.

The problem has been widely reported but so far I've seen no feed back.

It might be considered a driver problem rather than an OpenCL problem.

Allan

0 Likes
realhet
Miniboss

Btw did you tested for arithmetic correctness while over-clocking those 7970's?

I have some calculation errors  when I use NOT the first drivers (win 11.12, linux 12.1). It occurs after 1 minutes of stress tests when the gpu temperature goes above 80 celsius. But I'm not sure it's a memory transfer error, or another bug of mine.

Soon I'll make a test thing for this, but this is just weird:

At the moment I have these result while overclocking two 7970's to 1125 MHz:

first driver: CAL test goes without errors, but -2-3% performance drop if you use 2 GPUes.

first driver: OpenCL runs perfect while running on 1 GPU, but there is a terrible -50% performance drop while running OCL with 2 GPUes.

latest driver: OpenCL runs awesome on 2x gpues without penalty for multi-gpu. Also CAL runs at the usual -2-3% performance degradation with 2x gpues. BUT after 1 minute (temperature above 80 celsius) there are an increasing number of calculation errors while the temperature goes up (to 86celsius)o.O.

Did you experienced such errors?

0 Likes

We have slightly different behavior then you so far: (assuming we derive

proper intuitions based on our own testing)

1) OLDER driver is 20-30% faster on one machine then the other (same

exact card, faster on the small i5 mobo than the big sandy bridge

extreme mobo we use for 4 GPU testing),

AND LATEST seems to fix that for a single GPU here (finally matching

perf for same card)

I don't fully understand how all the parts interact, even wondering if

that is just Profiles.xml related?

2) both drivers are 50% slower for additional GPU (no change), but all

cards seem to get the 20-30% speedup here with latest driver (e.g.

instead of 4 FPS,2,2,2 it's now like 5,2.5,2,5,2,5 where 5 is the one

with the monitor cable connected).

3) We haven't check to see our results match yet as we were testing

without any output (i.e. no memory back returned to host, any PCIe

transfers...) to isolate issues, but we wondered about it... - will

do, that would suck. Is this compute error thing you see happening even

when the fans are running full spin/manual? Everything does slow down

here after it gets to a certain temperature but symmetrically as far as

I can tell (after a test unit that lasts maybe 2 minutes here basically,

we have to wait 10-15 minutes to get max perf on another run or reboot,

first time is usually fast before something starts to be temperature

conscious. Running the fans has some impact on peak speed.

4) Question, do you have something connected to the second GPU that you

say goes fast now? (will also try to "terminate" the cards with dummy

vga dongles as suggested in previous email, also in case got some extra

crossFire cables in case the driver pays attention to that for some reason).

5) LuxMark OpenCL appears to work fine on multi-GPU (more like 3X one

card with 4 GPU ) anyway that adds to our confusion even more. But, if I

look here: http://www.luxrender.net/luxmark/

.

Question/Suggestion for AMD: I imagine multi-GPU compute has

internally been tested on some reference machine? What/How? We're

getting frustrated here

Could there be an SDK sample demo that tests just that? Having that

could help "crowdsource" resolving such issues.

That: Pin Some memory, copy to all cards, loop for a while just some

memory copy and some arithmetic will do (like ~100ms of compute/copy on

card per frame in loop will do), spit timing per card every 1000 frames

(iterations) -- basically the most casual file-based workflow type

application (i.e. no xfire sharing, no tiling out of a scene out like

high-res interactive/gaming/video wall applications, just autonomous

compute per card).

Pierre

0 Likes

I always turn crossfire on on windows. If I don't I guess that the other card will be turned off if no display connected to it.

On linux I do nothing except COMPUTE=:0 and both 7970's are there.

0 Likes