Hi all,
I got an error "goto statement not allowed" when I compile a kernel with goto statement. NVIDIA SDK does allow goto statement and my kernel runs without any problem. Is this the restriction of AMD SDK compiler? is there any #pragma to allow this.
Thanks,
Roto
Whether irreducible control flow is supported is implementation defined. The AMD implementation does not support currently it because in general it is bad practice, very rarely necessary, and would be inefficient on current hardware. I imagine you're trying something like using goto as a double-break from a loop? You can probably transform your code a little to work around that.
Thanks Lee. I will have to modify the code then.
Roto
The AMD implementation does not support currently it because in general it is bad practice, very rarely necessary, and would be inefficient on current hardware.
Look at the threads discussing register spilling inefficiencies (SDK 2.2 at least.)
Here's just one quote: "went from Catalyst 10.5 to 10.7 and SDK 2.1 to SDK 2.2 and now all my kernels have horrible performance and the register allocation is approximately DOUBLE".
I just can't trust current compilers when writing portable code meant to be _fast_. How does the compiler handle register spilling/reusing across function calls?
goto may be "shitty practice," but I can afford to optimize some smaller and critical kernels with it and micromanage my register usage *properly*.
Yes, compilers aren't perfect. I don't see how goto will help you significantly with that, though. Are you trying to avoid functions actually being inlined and construct your own little functions that you jump to? Without being able to store the return address I can't see how that would help you.
Goto is terrible practice in all but the simplest cases when you're doing this lane-wise SIMD programming because the code can diverge arbitrarily and the compiler can't judge where to reconverge vectors. Either way, the hardware is not designed for that kind of control flow because of the way mask stacks work to deal with divergent code.
If you want to be helpful rather than misquoting me, maybe you could offer some code that you want to use gotos in to work around bugs or peceived portability issues in our or other vendors' OpenCL compilers?
It really wasn't my intention to misquote you (I assume you're referring to "shitty practice.") My apologies.
Apparently Nvidia can handle goto, but it's pointless to continue this discussion too much: The new GCN architecture will hopefully address these issues and make it easier to write better compilers.
Reza.