cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

madsbuvi
Adept I
Adept I

Kernel uses less registers since 13.1 - harming performance of my application

Hello!

Since drivers 13.1 and later, the AMD OpenCL compiler has been rather sparse with allocating registers for my code, and as a result there is massive register spilling and about 3x reduction in code performance.

When compiled with 12.11 or older it would use 244 registers, with no spilling. Still enough registers to achieve full utilization of the GPU since with no register spilling my code would be nearly wholly arithmetically bound while having enough wavefronts to keep the GPU occupied.

But when compiled with 13.1 or later it uses only 131, spilling a lot of registers.

Are there any compiler flags i can pass to allow/force the compiler to be more liberal in allocating registers?

My apologies in advance if i've missed any documents specifying this, or if this question has already been answered (i searched but couldn't find any entirely similar questions).

The specific code in question can be found here:

https://github.com/madsbuvi/MTY_CL/blob/master/readme.md

Run thought CodeXL should give a complete .cl file

the loop at lines 246-250 in gpu.cl and the whole of des.cl/sboxes.cl is the relevant runtime-critical section.

performance dropped to ~40 million from 125-130 million hashes / second with the new drivers. Edit: with the card 7850, i forgot to mention.

(The code also broke completely and generated wrong hashes with driver version 13.1, but this has been fixed in the newest beta driver. Mentioning this in case it might be related.)

edit:

http://devgurus.amd.com/message/1286728

seems somewhat relevant in terms of losing performance. But does not seem to be caused by register spilling.

0 Kudos
Reply
25 Replies
nou
Exemplar
Exemplar

Kernel uses less registers since 13.1 - harming performance of my application

try add

__attribute__((work_group_size_hint(64, 1, 1))) or __attribute__((reqd_work_group_size(64, 1, 1)))

0 Kudos
Reply
himanshu_gautam
Grandmaster
Grandmaster

Re: Kernel uses less registers since 13.1 - harming performance of my application

Thanks for reporting it. I will try to reproduce it at our end. Is the testcase 32-bit or 64-bit. It contains DLLs so i assume you are using Windows. Win7 or Win8?

0 Kudos
Reply
madsbuvi
Adept I
Adept I

Re: Kernel uses less registers since 13.1 - harming performance of my application

Thank you, but it made no difference.

0 Kudos
Reply
madsbuvi
Adept I
Adept I

Re: Kernel uses less registers since 13.1 - harming performance of my application

It is compiled as 32-bit and links to the 32-bit libraries. I am running a 64-bit version of windows 7.

0 Kudos
Reply
himanshu_gautam
Grandmaster
Grandmaster

Re: Kernel uses less registers since 13.1 - harming performance of my application

btw....I know you would have done.. but just asking for sake of it: "After the workgroup hint, you should be spawning 64 workitems per workgroup while launching the kernel". I hope you did that as well.

0 Kudos
Reply
madsbuvi
Adept I
Adept I

Re: Kernel uses less registers since 13.1 - harming performance of my application

Yes, i have of course tried this.

0 Kudos
Reply
himanshu_gautam
Grandmaster
Grandmaster

Re: Kernel uses less registers since 13.1 - harming performance of my application

Have you marked this as correct answer by accident? I can un-mark if you need.

0 Kudos
Reply
madsbuvi
Adept I
Adept I

Re: Kernel uses less registers since 13.1 - harming performance of my application

Haha, yes, sorry about that.

0 Kudos
Reply
Raistmer
Adept II
Adept II

Re: Kernel uses less registers since 13.1 - harming performance of my application

Well, I see the same for my app too,

With 13.1 app started to cause driver restarts. Comparing  ISA for too long running kernel I found that under 13.1 it uses only 5 registers while on 12.8 (where no driver restart) it uses 12 GPRs:

SQ_PGM_RESOURCES:NUM_GPRS     = 5

vs

SQ_PGM_RESOURCES:NUM_GPRS     = 12

So, register spilling inevitable under 13.1 that slows down kernel in such big degree that it causes driver restarts.

0 Kudos
Reply