AnsweredAssumed Answered

Bug of Catalyst drivers of version 12.10 and later running with Radeon HD 7000 series cards

Question asked by aaa on Mar 13, 2013
Latest reply on Aug 4, 2014 by pinform

An intricate bug has been found in the latest Catalyst drivers when running OpenCL kernels with constant indexes accessing the __local memory.

Conditions: The bug only emerges in each group when get_local_id(0) == 0 and get_local_id(1) == 0 or 14 in a (16, 16, 1) dimension grid.

Example:

__local float s_srcPatch[10][10];

__local float s_dstPatch[20][16];

float sum;

sum =       (evenFlag * 0.0625f) * s_srcPatch[0][1 + ((tidx - 2) / 2)];

sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[0][1 + ((tidx - 1) / 2)];

sum = sum + (evenFlag * 0.375f ) * s_srcPatch[0][1 + ((tidx    ) / 2)];

sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[0][1 + ((tidx + 1) / 2)];

sum = sum + (evenFlag * 0.0625f) * s_srcPatch[0][1 + ((tidx + 2) / 2)]; // Operation 5

s_dstPatch = sum;

 

 

In the case above, the operation 5 will be ignored. The final sum is incorrect.

 

 

__local float s_srcPatch[10][10];

__local float s_dstPatch[20][16];

float sum;

sum =       (evenFlag * 0.0625f) * s_srcPatch[get_local_size(1) - 16][1 + ((tidx - 2) / 2)];

sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[get_local_size(1) - 16][1 + ((tidx - 1) / 2)];

sum = sum + (evenFlag * 0.375f ) * s_srcPatch[get_local_size(1) - 16][1 + ((tidx    ) / 2)];

sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[get_local_size(1) - 16][1 + ((tidx + 1) / 2)];

sum = sum + (evenFlag * 0.0625f) * s_srcPatch[get_local_size(1) - 16][1 + ((tidx + 2) / 2)]; // get_local_size(1) == 16 which take the same effect as above code segment

s_dstPatch = sum;

 

 

If I use an expression "get_local_size(1) - 16", everything goes fine. I can get the correct sum.

Outcomes