cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

pwvdendr
Adept II

Bug in OpenCL driver of Evergreen family

I think I have encountered a bug in the OpenCL driver of the Evergreen family. Consider the following, trivial kernel:

float phi (float x) { return -log(tanh(x/2)); }

__kernel void bugTest( __global const float* rand, __global float* testA, __global float* testB, __local float *sumphibeta, __local short *minus, __local float *abs) {

    int i = get_global_id(0);

    minus=(rand<0)?-1:1;

    abs=(minus==-1)?phi(-rand):phi(rand);

    testA=abs;

    sumphibeta=0;

    testB=abs;

}

Then one would expect that testA=testB. However, it doesn't always work correct:

  • on CPU mode, it always works correct (I tried on all computers where it went wrong)
  • on non-AMD GPUs, it always works correct (I tried on 5 different nVidia GPUs)
  • on GPUs outside the Evergreen family, it always worked correct (I tried on HD6990, HD6990M, HD5470M)
  • on GPUs inside the Evergreen family, it always triggers a weird bug (I tried HD5450, HD5870, HD6370M). Suddenly I get testA=-testB for all i. All positive signs in testA get switched to negative signs in testB.

Given that this seems to happens consistently in the Evergreen family and not outside, and that it happens regardless of OS (32bit win7 or 64bit win7, at least) and that it is resolved by switching to CPU mode, makes me suspect that it is caused by a bug in the drivers, possibly the compiler.

I use JOCL as my front-end, so I have a Netbeans-project which triggers the bug. If you prefer plain C/C++, just run the above on an Evergreen GPU with

global_work_size = {73};

local_work_size = {73};

testA = new float[73];

testB = new float[73];

rand = {4.659722f, 9.79593f, 4.018818f, 14.205623f,

        4.8614163f, 13.74736f, 7.9671173f, 7.1122317f, 2.437036f, 14.670018f,

        7.5287275f, 5.4726443f, 8.878769f, 4.834208f, 2.4469802f, 15.271384f,

        3.7141998f, 14.71361f, 13.636945f, 11.304411f, 11.787898f, 12.776969f,

        10.551311f, 7.6994514f, 8.45033f, 7.945807f, 9.079653f, 13.230182f,

        12.124534f, 8.8559675f, 11.232213f, 12.435405f, 10.152599f, 7.382857f,

        6.679857f, 12.372585f, 6.143651f, 3.814587f, 6.479739f, 12.005309f,

        11.770777f, 11.998981f, 11.120965f, 4.4589887f, 6.4073095f, 0.70655245f,

        8.641274f, 13.5908375f, 3.871447f, 7.852525f, 9.202319f, 2.7622483f,

        7.2694182f, 9.263704f, 3.6282501f, 5.9980154f, 9.159372f, 3.906349f,

        8.7281275f, 9.331164f, 14.096231f, 12.455883f, 11.098762f, 2.8249257f,

        8.641249f, 10.528001f, 8.406309f, 5.1229515f, 10.946483f, 10.861275f,

        4.2477875f, 3.8476f, 5.520955f};

I hope I got to the right place with this bug report, since I find no other places on this site to report bugs...

--Peter

0 Likes
1 Solution

pwvdendr,

Thanks for the test case, I am able to reproduce this issue.

[Update:] I've root caused the bug to an issue in the shader compiler and have found a work-around. If you place a mem_fence(CLK_LOCAL_MEM_FENCE) after the line 'abs=(minus==-1)?phi(-rand):phi(rand);', then the program produces correct results by disabling the optimization that is mis-compiling the attached test case. Hope this helps.

View solution in original post

0 Likes
5 Replies
dovalec
Staff

Hi

I will check with shader compiler engineers.

If you can send us JOCL project with the source it will be much faster.

Thx Dov

dov.caspi@amd.com

0 Likes

I did, it was in the link, no? Perhaps not visible enough, here the link again: http://dl.dropbox.com/u/3060536/JOCLbug.rar

0 Likes

pwvdendr,

Thanks for the test case, I am able to reproduce this issue.

[Update:] I've root caused the bug to an issue in the shader compiler and have found a work-around. If you place a mem_fence(CLK_LOCAL_MEM_FENCE) after the line 'abs=(minus==-1)?phi(-rand):phi(rand);', then the program produces correct results by disabling the optimization that is mis-compiling the attached test case. Hope this helps.

0 Likes

Ok, great, I hope you can solve this issue. 🙂

Thanks for the workaround. Do you have any idea how long it takes approximately to fix such a bug? If it's not too long I'll just wait for it. Otherwise, can you give me some more details when exactly an additional mem_fence is needed and when not?

0 Likes

Usually it is a 3 month turn-around from bug fix to public release(not counting preview or one-off releases).

It looks like the mem fence will be needed when there is a conditional move with a result into local memory and then you read from the local memory location with the same index. A memory fence will be needed between the write and read.

The problem looks like it comes when the load is optimized away, it is selecting one of the operands of the ternary operator instead of the result.