Archives Discussions

datlatec · ‎10-21-2010

I have started working with opencl a couple of months ago with a nVidia card on ubuntu 10.04. As I wanted to get top performance I got myself a 5830 and a 5970, and things didn't go exactly as I thought they would...

Running my code on the 5830 gives me 4 times the performance I had on the nvidia card, but where CPU usage on nvidia was 2%, I get constant 100% on ATI. All of that CPU time is apparently spent on the kernel driver. It is kind of disappointing the need for X running also.

My real problems came with multiple cards, though. Running my code on single context, single thread across all GPUs gave me a marginal speed increase, and the funny thing is looking at aticontrol --odgc with one card, I get constant GPU usage of 99%, but with 3 GPUs (1 5830 and 2 on the 5970) I get 40+30+30. So are the kernels being serialized? I also tried separate threads with separate contexts/kernels/everything, and all I got was 3 CPU cores at 100% and the same 40/30/30 division of GPU usage.

Is there a way of quickly resolving this, other than moving back to nvidia?

I'm using the latest stream sdk (2.2) and driver is 10.9 (the hotfix from sep27).

nou · ‎10-21-2010

this is known issue http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=140518&enterthread=y

himanshu_gautam · ‎10-21-2010

datlatec,

nou is correct.there have been performance degrade issues reported in multigpu case.And AMD is working towards it.

But still there can be a lot of other parameters(event handling,access patterns etc) which might be affecting your performance.So you can post a suitable test case and we can discuss how perfomance might be improved in the present scenario.

datlatec · ‎10-21-2010

I see the same approx ratio of performance per extra GPU that SimpleMultiDevice gets (after increasing the kernel iterations heavily). The only thing I didn't test was complete separate processes, one per GPU, but I'll try to do that today, just so I can grab all the information.

I'm not using events in any way.

afo · ‎10-21-2010

Hi,

I was unaware of aticonfig --odgc, so I would like to share my experiences:

I have a phenom x4 and 2xHD5970.If I run 2 instances of my application I have this:

a) using both processor of one board

GPU[0]: 66%; GPU[1]: 62%; GPU[2]: 0%; GPU[3]: 0%

b) using one processor for each board

GPU[0]: 64%; GPU[1]: 0%; GPU[2]: 63%; GPU[3]: 0%

both cases:

CPU: 2 of them at 95/100% and one at around 35% amd other near 5%

If i run 3 instances of my application, I have this:

GPU[0]: 57%; GPU[1]: 57%; GPU[2]: 50%; GPU[3]: 0%

CPU: 3 of them at 95/100% and one at around 60%

If I run 4 instances of my application, I have this:

GPU[0]: 51%; GPU[1]: 48%; GPU[2]: 46%; GPU[3]: 51%

CPU: all of them at 100%

My best guess is this: in the openCL programming guide, it states that there is a global command queue for GPUs and a global command queue for CPUs (page 2-4), so in this serialized queue,when you have more than one GPU (no matter the number of context), there is a lot of overhead, creating a bottleneck that drops down the performance of the system. But this is just my personal guess.

hope this helps,

Alfonso

datlatec · ‎10-21-2010

afo: what is the gpu load when you run only one process on one gpu?

I just tried 3 complete separate processes, each one on its own GPU and the results are almost the same, except with a LOT more CPU overhead. I wonder if I could go around the serialization of GPU kernels by running multiple X servers? Does anyone know how to do this, if possible at all?

From what I read, this is linux only. Maybe I'll try to port to windows, though it is kind of a daunting task for my usage.

afo · ‎10-21-2010

Hi,

With one process:

GPU[0]=87%

GPU[1]=GPU[2]=GPU[3]=0

one CPU near 80%, the rest near 5%

I also execute 2/3/4 different process each one in its own terminal and with its own dataset.

I think that is not a good idea to go for windows, in winXP32; it is not possible to use the second GPU of HD5970, and in W7_64 it seems not possible to disable internal crossfire, so you could get wrong results from the second GPU of HD5970.

By the way: did you try separate GPUs in separate contexts in separate threads in the same application or separate applications each one using one thread in separate GPUs (i.e receiving the GPU to use as a parameter)?

best regards,

Alfonso

datlatec · ‎10-21-2010

re: windows, the thing is I have more than one 5970, and even not using the second GPU on each card, half the cores is still more than what I get now.

I really wish the developers would at least hint us if this is close to being released or not. It would help me greatly in deciding to stick with ATI or move to nVidia instead, which I'll probably end up forced to do to recoup my investment if this just takes too long.

keldor314 · ‎10-24-2010

Make sure you create a separate context for each GPU. Otherwise OpenCL will attempt to keep buffers coherent across the entire context, resulting in massive amounts of data moving around and serialization. This is a "feature" of OpenCL, not a bug, for some reason.

datlatec · ‎10-24-2010

I have tried all combinations, and none works as expected. Single context, multiple threads multiple contexts, multiple processes, all with the same result.

And I still haven't received any notion of when to expect this fixed from AMD, so I'm feeling very frustrated.

himanshu_gautam · ‎10-25-2010

hi datlatec,

Please send a test case at streamdeveloper@amd.com so that we can reproduce the problem.

datlatec · ‎10-25-2010

Originally posted by: himanshu.gautam hi datlatec,

Please send a test case at streamdeveloper@amd.com so that we can reproduce the problem.

Test case sent.

re: windows, I have ported everything to windows, on XP 32bits now. I have a few issues (like the size of the kernel source, for some reason, needs to be below some threshold and the program occasionally fails to build) but anyway:

- I can't use the second GPU core because there's no way to disable CrossFire, right? I'm getting garbage from it, as I'd expect.

- Putting a second GPU on the computer, I got the CrossFire disable option (these were not crossfired) and after disabling I get only one device, instead of the 4 expected. I assume that's because there are no screens attached to these, but I wonder... If I connect a second screen to the first GPU, will that show me the 2nd core there? Or just a second screen on the first core.

-Performance wise it was a little disappointing to see the single core run at the same speed as it did on linux. For comparison I'm running a 5830 @ 75% of the single 5970 core, that should be a 5870, right? Is the 5870 just 25% faster than the 5830?

- Not all is bad news though, I got the two cores to max out on windows, doubling performance. Shame I can't use the results the second core gives me though . I need to build a dummy vga connector so I can try 2 separate GPUs.

What results should I expect if I move to windows 7?

MicahVillmow · ‎10-25-2010

datlatec,
On windows if you have multiple display cables, you can connect multiple graphic cards to the same monitor to have them enabled without having to use a dummy vga connector. One of my monitors has three different graphic cards attached using VGA, HDMI and DVI cables.

datlatec · ‎10-25-2010

MicahVillmow: It is a great hint, thank you! My monitor has a single vga input though, still it is something I'll keep in mind when buying the next one.

empty_knapsack · ‎10-25-2010

Originally posted by: MicahVillmow datlatec, On windows if you have multiple display cables, you can connect multiple graphic cards to the same monitor to have them enabled without having to use a dummy vga connector. One of my monitors has three different graphic cards attached using VGA, HDMI and DVI cables.

Erhm. As even AMD representative himself using these tricks to make all GPUs recognized I assume there NO chances to finally remove these limitations about attaching monitors/plugs at driver level? Why NVIDIA GPUs doesn't have these "attaching issues" and why it isn't possible to be fixed by AMD?

ThomasUCF · ‎10-28-2010

Hi MicahVillmow:

I did as what you said, why when I run the samples in SDK 2.2, nothing happened?There is just a black window

Thanks

ThomasUCF

MicahVillmow · ‎10-25-2010

empty_knapsack,
This issue has been brought up with the driver team.

empty_knapsack · ‎10-25-2010

Good to hear it.

So we can expect this to be implemented in next 9-12 months I guess. Sorry, can't resist this.

datlatec · ‎10-25-2010

As much of a joking tone as you have there, I've read people complaining about the 5970 lack of performance in linux from back in March, with AMD stating soon they would fix it... how soon is soon anyway???

empty_knapsack · ‎10-25-2010

Well, support for 5970 was completely broken with Catalyst 10.4 release (so it happens in April). It takes 3 months to fix it on CAL layer -- starting from 10.7 it possible to use both cores of 5970 again. At least it's true for Windows (again, not with OpenCL but with CAL/IL).

So for OpenCL it'll be either fixed with SDK 2.3 release or not fixed for another 3-6 months (I forgot what's release cycle for SDK, AFAIR it's 3 months).

dravisher · ‎10-25-2010

Really hoping the dual-GPU cards will be supported in the next SDK. It'll be a major disappointment if the dual-GPU HD6000 series card (AKA HD6990 AKA Antilles) isn't suited for OpenCL untill sometime in 2011.

datlatec · ‎10-26-2010

So, using the current Catalyst 10.10 driver and Stream SDK 2.2, is there *any* OS that supports both cores of 5970 using OpenCL?

So far I tried;

Linux (Ubuntu 10.04, 32&64 bits): Sees and uses both cores, but kernels are serialized, not only across cores but any other GPU in the system

Windows (XP 32bits): Sees both cores if crossfire is enabled, but second core results are trashed. Sees only one core if crossfire is disabled (takes a little work and 2 cards to accomplish this, though), works well across different GPUs.

MicahVillmow · ‎10-28-2010

ThomasUCF,
Make sure that you are on the correct monitor setting. Sometimes after I install a new card it selects the wrong display head as the main display, so I need to switch it.

ThomasUCF · ‎10-28-2010

Hi MicahWillmow:

So in Windows 7 where can I change the setting and how to switch it to the correcting setting?

Thanks a lot.

ThomasUCF

MicahVillmow · ‎10-28-2010

It is a setting on your monitor, you need to select the correct input source. i.e. for me I select HDMI when I want one card, DVI for another and VGA for a third. I'm not sure how to set it in windows.

ThomasUCF · ‎10-28-2010

Hi MicahVIllmow:

I didn't mean the whole screen is black, I mean the screen where shows the program results is black, if I didn't plug in dummy VGA, it runs, if I plug in, there is nothing but black. What can I do?

Thanks a lot,

ThomasUCF

MicahVillmow · ‎10-28-2010

have you tried setting the env variable GPU_DEVICE_ORDINAL=# where # is the device number you want to execute on? See if it works for some devices.

afo · ‎10-29-2010

Hi,

I am not sure if it's better to continue this thread or open a new one, so please apologize me.

My question is: The drop of performance is an opencl issue or a driver issue? Is something that we can do to mitigate it? I.e. It should be better to do a memory tranfer + kernel invocation for each GPU in order; or it could be better to do the memory transfers for each GPU, wait to them to finish, and then start calling the kernels? Does it make a difference? thanks a lot for any insight about this.

best regards,

Alfonso

Archives Discussions

One more on multiple GPUs