Bdot

AMD KernelAnalyzer crash

Discussion created by Bdot on Jun 30, 2011
Latest reply on Jul 18, 2011 by rouellet
#pragma unroll supported?

Hi,

I have a kernel with a fixed size loop (64x) that the KernelAnalyzer does not unroll automatically. When specifying #pragma unroll, KernelAnalyzer crashes.

For a test, I used the changed constantBandwidth_single_static kernel. When not specifying the unroll count, or a high one, the crash is also reproducable here, even though this kernel is unrolled automatically, even without the #pragma.

Oddly enough, the loop is even unrolled when specifying #pragma unroll 1 ...

 

My kernel additionally has the odd side effect the a higher unroll count results in a longer runtime (as predicted by KernelAnalyzer) ! When I unroll the loop manually (specifying the block 64 times with no loop), it is about half the runtime of the loop, meaning -50% runtime.

#pragma unroll 2   results in +50% runtime
#pragma unroll 4  +150%
#pragma unroll 8  +350%

This leads to the question: Is #pragma unroll supported? Is it useful in some cases (in which?) ?

Or is that just a problem in KernelAnalyzer? I currently cannot test on real Hardware ...

 

__kernel void constantBandwidth_single_static(__global DATATYPE *cb, __global DATATYPE *output) { DATATYPE val = (DATATYPE)(0.0f); uint gid = get_global_id(0); uint index = 0; uint i; #pragma unroll 64 for (i=0; i<256; i++) { val = val + cb[index + i]; } output[gid] = val; }

Outcomes