cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

gat3way
Journeyman III

Multi-GPU broken with SDK 2.5

Previously we had GPU_USE_SYNC_OBJECTS environment variable and it apparently does not work now. We have again those spinlocks in the runtime and the 100% CPU usage problem..performance drops. Thank you, but I am sticking with 2.4 until that's solved.

bitselect() still not mapped to BFI_INT. Why?

The BFE_UINT optimization (which is mentioned in the docs) for some reason is slower when it operates on values from __local memory, for some reason additional MOV instructions are generated and now some of my kernels are slower. Because MOV+BFE is slower than LSHR+AND.

offline compilation now broken too.

I am rather disappointed 😞

0 Likes
17 Replies

gat3way,
Can you give an example on how offline compilation is broken?
0 Likes

someone on forum reported that even offline compilation example from AMD knowledge base is broken.

0 Likes

Correct, clBuildProgram() either crashes or returns error with empty log.

strace shows that it is trying to open /usr/lib/libatiocl32.so (even though I am using the 64-bit runtime)....also atiocl32.so? Hmmm

OK this was related to previous ICD profiles I believe...still can't get offline devices compilation done though 😞

0 Likes

http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=153378&enterthread=y

0 Likes

I tried building for particular devices only - it either crashes or returns errors as well. This is rather annoying. The bad thing is that I can't even stay with 2.4 for kernel compilation then use the 2.5 runtime as it has this GPU_USE_SYNC_OBJECTS_NOT_WORKING problem which basically kills any performance benefits gained from reduced kernel launch latency/host-device transfers and renders overall performance worse. SDK 2.5 becomes a big problem at least for me. Obviously I don't own all kinds of AMD hardware so that I can build binaries. And I also have no way to get multi-gpu running seamlessly like it did in 2.4 and 2.3.

I don't know if that's linux problem only, probably on windows, those work, probably not.

I know you are all focused on those fancy new APUs, but please do not break anything that used to work 😞

Well sorry for ranting, but that basically killed my enthusiasm for the new SDK, I expected things to improve, instead things that used to work are now broken 😞

0 Likes

Originally posted by: gat3way I tried building for particular devices only - it either crashes or returns errors as well. This is rather annoying. The bad thing is that I can't even stay with 2.4 for kernel compilation then use the 2.5 runtime as it has this GPU_USE_SYNC_OBJECTS_NOT_WORKING problem which basically kills any performance benefits gained from reduced kernel launch latency/host-device transfers and renders overall performance worse. SDK 2.5 becomes a big problem at least for me. Obviously I don't own all kinds of AMD hardware so that I can build binaries. And I also have no way to get multi-gpu running seamlessly like it did in 2.4 and 2.3.

 

I don't know if that's linux problem only, probably on windows, those work, probably not.

 

I know you are all focused on those fancy new APUs, but please do not break anything that used to work 😞

 

Well sorry for ranting, but that basically killed my enthusiasm for the new SDK, I expected things to improve, instead things that used to work are now broken 😞

 

gat3Way,

You are facing two problems 

    1. GPU_USE_SYNC_OBJECTS  not working

    2. offline compilation issues

        Could you please run following and let me know what is happening?

            ./Reduction --dump binaryName

Could you please give us following information also?

     OS, Driver version, CPU and GPU?

0 Likes

Tried it - got segfault too.

 

OS: Debian Testing

Driver version: Catalyst 11.7

CPU: AMD Phenom x4

GPU: AMD Radeon HD 6870

0 Likes

For anyone interested: I made offline compilation work finally!!!

 

Looks like the compiler crashes for those three targets:

* Lions

* Bears

* Tigers

 

I don't even know what those are (future 7xxx GPUs?).

 

Anyway, the trick is to create a context using all offline devices, then do clBuildProgram for each one of them, excluding those three.

 

Now, the GPU_USE_SYNC_OBJECTS problem is the other thing that we need to discover a workaround for 🙂

0 Likes

Originally posted by: gat3way  

 

Now, the GPU_USE_SYNC_OBJECTS problem is the other thing that we need to discover a workaround for 🙂

 

GPU_USE_SYNC_OBJECTS issue will be fixed in upcoming drivers. Please see release note of driver whether it is fixed or not.

0 Likes

I have multi-gpu issue also, namely it crashes the computer alltogether. I tested luxmark as a multi-gpu benchmark tool, and it works alright without setting COMPUTE=:0, but when it is set, first it instantly froze the machine, second, I saw a few corrupted images rendered by the kernels before OS crashed.

(The glossy ball image is either fuzzy at the start, or completely black. However I saw vivid random color pixels on the rendered image, and only there, so most likely it was not frame buffer corruption, but kernel output itself.)

OS:Ubuntu 10.04.3 64-bit LTS, Catalyst 11.8, SDK 2.5

ps.: luxmark in CPU mode performed flawless.

ps.2:I have not tried using GPU_USE_SYNC_OBJECTS.

0 Likes

Could someone reassure me that this issue is not only on my side? I would very much like to know whether I should wait for a driver, or revert back to SDK 2.4 until next SDK comes out?

0 Likes

The best combination for me is Cat 11.4 + SDK 2.4. It works on Windows and Linux. It does not generate false positives as seen on Cat 11.5. It works well with Cayman Devices as Cat 11.6 does not. And it does not generate 100% load on CPU on Linux (but on Windows) as Cat 11.7 and Cat 11.8 does. Multi-GPU works if you set GPU_USE_SYNC_OBJECTS to 1.

0 Likes

Thanks for the info, I'll try reverting then. The reason I want to get it working so badly, is because of the cached reads enabled by default, which brought about 50-75% increase in multiple OpenCL applicaions. It's a huge boom in performance, shame it came accompanied by this multi-gpu messup.

All I wanted was someone official to state: yes, we are aware, it is screwed up, expect a fix in the next driver OR SDK. I'm curious about the 'OR' part, which one should we wait for?

0 Likes

I've just tested new Catalyst 11.9. This annoying 100% CPU bug still exists on both, Linux and Windows. Rumours saying its fixed on Windows are fake.

To AMD: How is this possible? Anything we (the users) can do fix this problem? Maybe some donations?

 

0 Likes

A bump.

Any response from AMD?

Can you at least confirm the issue exists and you are working on a solution?

0 Likes

Originally posted by: quadboon I've just tested new Catalyst 11.9. This annoying 100% CPU bug still exists on both, Linux and Windows. Rumours saying its fixed on Windows are fake.

To AMD: How is this possible? Anything we (the users) can do fix this problem? Maybe some donations?

 



We found few more issues on windows. we are working on this.

0 Likes

I can confirm this bug on windows also, and it's using Catalyst 11.10 with a Radeon HD6720G2 ( hybrid crossfire solution). Anyways, the software is one of the projects supported by amd, namely Bullet Physics ( version 2.79-rev2440 ). The only fix i could find was to Copy to the CPU and back to the graphics card in opengl.

 

0 Likes