I think I have encountered a bug in the OpenCL driver of the Evergreen family. Consider the following, trivial kernel:

float phi (float x) { return -log(tanh(x/2)); }

__kernel void bugTest( __global const float* rand, __global float* testA, __global float* testB, __local float *sumphibeta, __local short *minus, __local float *abs) {

int i = get_global_id(0);

minus[i]=(rand[i]<0)?-1:1;

abs[i]=(minus[i]==-1)?phi(-rand[i]):phi(rand[i]);

testA[i]=abs[i];

sumphibeta[i]=0;

testB[i]=abs[i];

}

Then one would expect that testA[i]=testB[i]. However, it doesn't always work correct:

- on CPU mode, it always works correct (I tried on all computers where it went wrong)
- on non-AMD GPUs, it always works correct (I tried on 5 different nVidia GPUs)
- on GPUs outside the Evergreen family, it always worked correct (I tried on HD6990, HD6990M, HD5470M)
- on GPUs inside the Evergreen family, it always triggers a weird bug (I tried HD5450, HD5870, HD6370M). Suddenly I get testA[i]=-testB[i] for all i. All positive signs in testA get switched to negative signs in testB.

Given that this seems to happens consistently in the Evergreen family and not outside, and that it happens regardless of OS (32bit win7 or 64bit win7, at least) and that it is resolved by switching to CPU mode, makes me suspect that it is caused by a bug in the drivers, possibly the compiler.

I use JOCL as my front-end, so I have a Netbeans-project which triggers the bug. If you prefer plain C/C++, just run the above on an Evergreen GPU with

global_work_size = {73};

local_work_size = {73};

testA = new float[73];

testB = new float[73];

rand = {4.659722f, 9.79593f, 4.018818f, 14.205623f,

4.8614163f, 13.74736f, 7.9671173f, 7.1122317f, 2.437036f, 14.670018f,

7.5287275f, 5.4726443f, 8.878769f, 4.834208f, 2.4469802f, 15.271384f,

3.7141998f, 14.71361f, 13.636945f, 11.304411f, 11.787898f, 12.776969f,

10.551311f, 7.6994514f, 8.45033f, 7.945807f, 9.079653f, 13.230182f,

12.124534f, 8.8559675f, 11.232213f, 12.435405f, 10.152599f, 7.382857f,

6.679857f, 12.372585f, 6.143651f, 3.814587f, 6.479739f, 12.005309f,

11.770777f, 11.998981f, 11.120965f, 4.4589887f, 6.4073095f, 0.70655245f,

8.641274f, 13.5908375f, 3.871447f, 7.852525f, 9.202319f, 2.7622483f,

7.2694182f, 9.263704f, 3.6282501f, 5.9980154f, 9.159372f, 3.906349f,

8.7281275f, 9.331164f, 14.096231f, 12.455883f, 11.098762f, 2.8249257f,

8.641249f, 10.528001f, 8.406309f, 5.1229515f, 10.946483f, 10.861275f,

4.2477875f, 3.8476f, 5.520955f};

I hope I got to the right place with this bug report, since I find no other places on this site to report bugs...

--Peter

pwvdendr,

Thanks for the test case, I am able to reproduce this issue.

[Update:] I've root caused the bug to an issue in the shader compiler and have found a work-around. If you place a mem_fence(CLK_LOCAL_MEM_FENCE) after the line 'abs[i]=(minus[i]==-1)?phi(-rand[i]):phi(rand[i]);', then the program produces correct results by disabling the optimization that is mis-compiling the attached test case. Hope this helps.