cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Linuxhippy
Adept I

SGPR usage trippled on GCN-1.2 (v8) GPUs

Hi,

I've analyzed an OpenCL-kernel using CodeXL and I am quite happy with the register-usage - on GCN 1.0/1.1 devices per SIMD the maximum of 10 wavefronts can be queued, so hopefully memory latencies can be hidden efficiently.

However on GCN-1.2 devices (Tonga), SGPRs usage exploded - while on Capverde the same kernel consumes 32 SPGRs, on Tonga 94 SGPRs are required which limits the kernel to 5 parallel waves per SIMD (screenshots attached).

Any idea why the same Code running on Tonga requires almost 3 times the SGPRs?

Have there been architectural changes to Tonga or are there pitfalls when it comes to SGPR usage?

Thank you in advance, Clemens

0 Likes
8 Replies

Can you post the ISA , or the source code if possible?

0 Likes

Hi Tzachi,


Please find the (obfuscated) code attached.

For now it is completly unoptimized for the target architecture, however I would like to rule out potential driver bugs before spending days on tuning

The driver installed on the system was Catalyst Omega 14.12 on Windows-7 64-bit, although we plan to switch to 64-bit linux later to use OpenCL2's fine grained shared memory features.

Thank you in advance, Clemens

0 Likes

Hi,

Please upgrade to CodeXL 1.7. CodeXL1.6 has a bug and it does not display Tonga's occupancy correctly.

The true occupancy of 94 SGPR on Tonga is 8 waves, not 5.

Let me assure you that scheduling 8 wave on Tonga vs 10 waves on other GCN devices is an informed decision by the runtime\compiler due to considerations I cannot share here in the forum.

Sincerely

Tzachi Cohen

Can you share the background why SGPR useage us so much higher for Tonga devices?

Actually my kernel is only a simple starting point which will be extended, if I knew what is going on I could avoid certain patterns causing the high SGPR allocation.

Thank you in advance, Clemens

PS: I am looking forward to the open-source HSA stack for Linux AMD is currently working on, thanks for bringing GPU compute to the open-source world - which makes it interesting for industrial applications like the one I am working on. (so I am just evaluating Kaveri with Catalyst to get a feeling for the capability of the hardware, but we most likely would not deploy Catalyst for an industrial grade system),

0 Likes

Hi Tzachi,

Any update on why the SGPR useage has increased that much on GCN-1.2?

Thanks, Clemens

0 Likes

I don't think it's likely you get any update on that. AMD has been fairly secretive on GCN1.2 so far. Hopefully this is going to change after the 300 series hit the shelves.

But I have some opinions on that.

What I've noticed is that the SALU is often under-utilized. In my experience you will have to get very lucky to have it >20% and it's most of the time much lower. For the kernels I had to deal with, it's often not even 5%. I also don't recall being SGPR limited either.

I speculate they have upgraded the SALU so it can now kick in more often. There's an user on those forums with quite accurate knowledge of ISA, perhaps he will share some more substantial facts.

0 Likes
maxdz8
Elite

As a start: I've also noticed this.

But I have to say in real world I've not found it to be such a big problem. With 21 VGPRs this sounds like a simple kernel; I've never had such a case.

Following.

0 Likes
youwei
Adept I

Even if it is an EMPTY kernel, the SGPR usage is 94, which leads to an 8 to 10 occupancy.

I think AMD had better give some explanations...tzachi.cohen

0 Likes