Archives Discussions

Bdot · ‎06-30-2011

#pragma unroll supported?

Hi,

I have a kernel with a fixed size loop (64x) that the KernelAnalyzer does not unroll automatically. When specifying #pragma unroll, KernelAnalyzer crashes.

For a test, I used the changed constantBandwidth_single_static kernel. When not specifying the unroll count, or a high one, the crash is also reproducable here, even though this kernel is unrolled automatically, even without the #pragma.

Oddly enough, the loop is even unrolled when specifying #pragma unroll 1 ...

My kernel additionally has the odd side effect the a higher unroll count results in a longer runtime (as predicted by KernelAnalyzer) ! When I unroll the loop manually (specifying the block 64 times with no loop), it is about half the runtime of the loop, meaning -50% runtime.

#pragma unroll 2 results in +50% runtime
#pragma unroll 4 +150%
#pragma unroll 8 +350%

This leads to the question: Is #pragma unroll supported? Is it useful in some cases (in which?) ?

Or is that just a problem in KernelAnalyzer? I currently cannot test on real Hardware ...

__kernel void constantBandwidth_single_static(__global DATATYPE *cb, __global DATATYPE *output) { DATATYPE val = (DATATYPE)(0.0f); uint gid = get_global_id(0); uint index = 0; uint i; #pragma unroll 64 for (i=0; i<256; i++) { val = val + cb[index + i]; } output[gid] = val; }

bpurnomo · ‎07-01-2011

Hi,

We are not able to reproduce the crash problem with the current internal build of APP KernelAnalyzer. Please send us the kernel code through the Help Desk system so we can confirm that the next version of the tool no longer has this issue. Thanks.

Bdot · ‎07-05-2011

I now uploaded my code to the Help Desk. From your reaction I conclude #pragma unroll should work as one expects.

Thanks a lot for your help.

rouellet · ‎07-18-2011

Hi,

I got your example code from the help desk system. I see your problem with Kernel Analyzer v1.8. For me the tool doesn't crash, but does fail to compile the program. The (soon to be released) 1.9 version correctly compiles your code. Once I figure out how to use the Help Desk system, I'll try to answer the other questions you added there.

Sorry for the delay in replying; I'm busy learning a dozen new tools right now.

Roland Ouellette

Advanced Micro Devices

Archives Discussions

AMD KernelAnalyzer crash