cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

pwvdendr
Adept II

Strange bug after GPU upgrade

I just replaced my HD 5450 with a HD 7950 and reinstalled the drivers & APP SDK, but it seems a few of my OpenCL programs now suddenly refuse to run. Using Java + JOCL (www.jocl.org), I get an EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000000000b247bd7, pid=3812, tid=188

coming from aticaldd64.dll at calddiGetVersion (the pc, pid and tid numbers vary).

I have managed to reduce my kernel to this trivial example, launched on global work size 1 and with LP a float[1]. This triggers the above bug on GPU, but works flawlessly on CPU as well as on my other GPU.

typedef struct{ uint x; uint c; } mwc64x_state_t;

void MWC64X_SeedStreams(mwc64x_state_t *s) {

    ulong a=4077358422479273989UL;

    ulong b=1UL;

    ulong x=0;

    while(a!=0){

        x=x+b;

        b=b+b;

        a=a>>1;

    }

    s->x=(x/4294883355U);

    s->c=(x%4294883355U);

}

float MWC64X_NextFloat(mwc64x_state_t *s) {

    return ((float)(s->x ^ s->c))/65536.0f/65536.0f;

}

__kernel void generateAWGNandDecodeLLRlowmem(__global float *LP) {

    mwc64x_state_t rng;

    MWC64X_SeedStreams(&rng);

    LP[0] = MWC64X_NextFloat(&rng);

}

0 Likes
1 Solution

Noooo, not yet another compiler bug!?

Anyway, very well remarked... seems like it is indeed another compiler bug.

None other than return of the Tahiti compiler unroll-loop-bug . Simpler versions of the bug were fixed some time ago. In this case the loop crashes when the counter exceeds 20 and the value x is then divided by most any number. I'm currently using display driver (and compiler) 9.001, which is just about the latest.

__kernel void tahitiloopbug2(__global long *buf) {

     long a=5,b=1,x=0;int i;     

 

    for(i=0;i<21;i++){       //crashes on loop count >20

        x=x+b;

        b=b+b;

    }

    x=x/3;           //crash on divide by a number >3

  //x=(int)x/3;     //no crash for integer division

  //x=x+3;          //no crash on other ops

    buf[0] = x;

Alternatively, would you be able to recommend me a decent RNG for Tahiti? I took this one because it allows to skip arbitrarily large parts of the stream easily, which helps to give each work item an independent RNG stream, but any RNG offering good GPU performance will do just fine.

Probably not a good one. One thing I have used is 2 simple but different RNGs that are based on different algorithms (not different seeds). When the kernel starts, the first RNG seeds the second, which is then used for computation. The first RNG can be seeded from some permutation of the global id. This hopefully prevents the second RNG from seeing any pattern to the seed values. I don't know how good the numbers are.

View solution in original post

0 Likes
6 Replies
binying
Challenger

Does this line help? "#define CL_USE_DEPRECATED_OPENCL_1_1_APIS"?

Or, when you reinstall the driver, did you uninstall it first?

0 Likes

Does this line help? "#define CL_USE_DEPRECATED_OPENCL_1_1_APIS"?

Doesn't seem to change anything.

Or, when you reinstall the driver, did you uninstall it first?

Even installed new Windows. Same behaviour with drivers coming with the GPU, latest from AMD website and after explicitly installing the AMD APP SDK.

0 Likes

Peter Vandendriessche wrote:

Does this line help? "#define CL_USE_DEPRECATED_OPENCL_1_1_APIS"?

Doesn't seem to change anything.

Or at least not when adding this in my .cl file.

As I said I'm using JOCL, so I cannot add C code anywhere, only Java and OpenCL.

Since this doesn't appear to be a known problem, I started bughunting and have narrowed the problem down to the trivial kernel above (see initial post). Any idea what the problem/solution could be?

0 Likes

I think this must be a Tahiti compiler problem.

It crashes the Kernel Analyzer when the Tahiti compiler is selected but compiles correctly on Cayman and also compiles and runs on a standard C compiler (gcc).  So nothing to do with your program or Java. On Cayman, the ocl compiler figures out the answer and  the compiled code simply returns the value 0.918128. That's probably where Tahiti is having the problem.

One trick, if you can rewrite your code so the (overly smart) compiler does not know the initial seed value, it might run OK. You might do this by passing the seed value to the kernel as a parameter and then assigning it to the structure. Be ware of a constant parameter.

; --------  Cayman Disassembly --------------------

00 ALU: ADDR(32) CNT(4) KCACHE0(CB1:0-15)

      0  x: LSHR        R1.x,  KC0[0].x,  2

      1  x: MOV         R0.x,  (0x3F68FE33, 0.9101287723f).x

01 MEM_RAT_CACHELESS_STORE_DWORD__NI: RAT(11)[R1].x___, R0, ARRAY_SIZE(4)  VPM

02 END

I'm using the most recent beta drivers version 12.9 from the AMD site.

You mentioned this happens for the original drivers that came with the card and the latest drivers from AMD

Do you know the driver version that came with the card?

drallan

Noooo, not yet another compiler bug!?

Anyway, very well remarked... seems like it is indeed another compiler bug.

One trick, if you can rewrite your code so the (overly smart) compiler does not know the initial seed value, it might run OK. You might do this by passing the seed value to the kernel as a parameter and then assigning it to the structure.

Hmm... the seed was not even in there and is already softcoded in my full code. The constants are written via "enum" (not really sure what this is) and just replacing them with softcoded parameters seems not to be entirely the same. I have attached a full version of the RNG with minimal code to call it to the initial post (bug.txt), is your trick applicable here? Also, if I remove the 'b=b+b' then it works fine, strangely (but of course gives no properly distributed random numbers anymore).

Alternatively, would you be able to recommend me a decent RNG for Tahiti? I took this one because it allows to skip arbitrarily large parts of the stream easily, which helps to give each work item an independent RNG stream, but any RNG offering good GPU performance will do just fine.

I'm using the most recent beta drivers version 12.9 from the AMD site.

You mentioned this happens for the original drivers that came with the card and the latest drivers from AMD

Do you know the driver version that came with the card?

The cd says Driver Ver. 12-130(7/XP/Vista only) (8.961) (Cat. 12.4). I'm now using the 'stable' 12.8.

0 Likes

Noooo, not yet another compiler bug!?

Anyway, very well remarked... seems like it is indeed another compiler bug.

None other than return of the Tahiti compiler unroll-loop-bug . Simpler versions of the bug were fixed some time ago. In this case the loop crashes when the counter exceeds 20 and the value x is then divided by most any number. I'm currently using display driver (and compiler) 9.001, which is just about the latest.

__kernel void tahitiloopbug2(__global long *buf) {

     long a=5,b=1,x=0;int i;     

 

    for(i=0;i<21;i++){       //crashes on loop count >20

        x=x+b;

        b=b+b;

    }

    x=x/3;           //crash on divide by a number >3

  //x=(int)x/3;     //no crash for integer division

  //x=x+3;          //no crash on other ops

    buf[0] = x;

Alternatively, would you be able to recommend me a decent RNG for Tahiti? I took this one because it allows to skip arbitrarily large parts of the stream easily, which helps to give each work item an independent RNG stream, but any RNG offering good GPU performance will do just fine.

Probably not a good one. One thing I have used is 2 simple but different RNGs that are based on different algorithms (not different seeds). When the kernel starts, the first RNG seeds the second, which is then used for computation. The first RNG can be seeded from some permutation of the global id. This hopefully prevents the second RNG from seeing any pattern to the seed values. I don't know how good the numbers are.

0 Likes