cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

dschwen
Journeyman III

Erratic freezes on linux (update)

(SDK2.1, Catalyst 10.5,10.6,10.7, still in SDK 2.2, Catalyst 10.7b)

Attached kernel code (compiles fine, Stream Analyzer under Windows does not complain either) together with a rather simple initialization and running program makes my computer freeze completely.

The precise behaviour is rather erratic. If I comment out the main loop of the kernel the whole thing runs a few times (1 single kernel execution per run) until it freezes my computer. The non-opencl-certified drivers 10.6 and 10.7 lead to immediate freezing, with the 10.5 driver the code hits a clFinish after I submit the kernel, and then takes about 1-2 minutes to completely lock up the system.

I have no idea where to even start looking. 😞

#define SIZE 32 #define SIZE2 1024 #define LSIZE 8 #define LSIZE2 100 __kernel void mu3d( float4 Eax, float4 Ebx, float4 Evx, float4 Exx, float4 Ex, float zz, float ddzn, float ncd2, __global float *cA, __global float *cV, __global float *ccv, __global float4 *u, __local float4 *cn ) { int xl = get_local_id(0); int yl = get_local_id(1); int a[3]; a[0] = xl + yl * (LSIZE+2); a[1] = a[0] + LSIZE2; a[2] = a[1] + LSIZE2; cn[a[0]].s3 = 0.0f; cn[a[1]].s3 = 0.0f; cn[a[2]].s3 = 0.0f; int xg = ( get_global_id(0) / ( LSIZE + 2 ) ); xg *= LSIZE; xg += xl; int yg = ( get_global_id(1) / ( LSIZE + 2 ) ); yg *= LSIZE; yg += yl; int b = ( (xg-1) & (SIZE-1) ) + ( (yg-1) & (SIZE-1) ) *SIZE; float4 DC; float rv; // // copy self // // below rv = 1.0f / ( 1.0f - ccv[ b + (SIZE-1)*SIZE2 ] ); // reduced volume cn[a[2]].s0 = cA[ b + (SIZE-1)*SIZE2 ] * rv; cn[a[2]].s2 = cV[ b + (SIZE-1)*SIZE2 ] * rv; cn[a[2]].s1 = rv - cn[a[2]].s0 - cn[a[2]].s2; // current rv = 1.0f / ( 1.0f - ccv ); // reduced volume cn[a[0]].s0 = cA * rv; cn[a[0]].s2 = cV * rv; cn[a[0]].s1 = rv - cn[a[0]].s0 - cn[a[0]].s2; for( int zl = 0; zl < SIZE; zl++ ) { // fetch next z-layer rv = 1.0f / ( 1.0f - ccv[ b + ( (zl+1) & (SIZE-1) )*SIZE2 ] ); // reduced volume cn[a[(zl+1)%3]].s0 = cA[ b + ( (zl+1) & (SIZE-1) )*SIZE2 ] * rv; cn[a[(zl+1)%3]].s2 = cV[ b + ( (zl+1) & (SIZE-1) )*SIZE2 ] * rv; cn[a[(zl+1)%3]].s1 = rv - cn[a[(zl+1)%3]].s0 - cn[a[(zl+1)%3]].s2; barrier(CLK_LOCAL_MEM_FENCE); // copy back chemical potential if( xl > 0 && xl < (LSIZE+1) && yl > 0 && yl < (LSIZE+1) ) { DC = ( cn[a[(zl+1)%3]] + cn[a[(zl+2)%3]] + // +2 = -1 cn[a[zl%3]+1] + cn[a[zl%3]-1] + cn[a[zl%3]+LSIZE+2] + cn[a[zl%3]-LSIZE-2] - 6.0f * cn[a[zl%3]] ) / ncd2; u[ b + zl*SIZE2 ] = zz * ( Eax*cn[a[zl%3]].s0 + Ebx*cn[a[zl%3]].s1 + Evx*cn[a[zl%3]].s2 ) + ddzn * ( Eax * DC.s0 + Ebx * DC.s1 + Evx * DC.s2 ) -zz * Exx * 0.5f - Ex; } } }

0 Likes
7 Replies
dschwen
Journeyman III

Update: I continued developing my code with CL_DEVICE_TYPE_CPU, it performs exactly as expected, and retrurns correct results. 

Switching back to CL_DEVICE_TYPE_GPU causes my code to break at the first clFinish after queuing my kernels with clEnqueueNDRangeKernel. The code seems to hang, and is reported as <defunct> in top (zombie). At this point my machine is still unaffected, but after a few more seconds (maybe related to me performing an operation like opening the kde system menue, or selecting text in an editor (something that triggers a certain operation on the graphics card?)) the computer freezes completely. Screen contents stay intact, but nothing works, not remote login, nothing. Only Magic-SysReq can reboot the computer.

I'm I the only one with problems like this?

0 Likes

Thanks for reporting the issue.  The kernel is failing to compile correctly, leading to the hangs.  I don't have a suggested workaround unfortunately except to change "cn" into a global memory pointer.

Jeff

0 Likes

Just to clarify: not compiling correctly but yet not reporting a compile time error, right?

Do you know what is it about it that is not compiling correctly? A previous project I worked on had a similar laplacian-type calculation done the same way in the kernel, just way less different variables, and all just float rather than float4,float2.

Is that an issue that will disappear in the next release?

0 Likes

Originally posted by: dschwen Just to clarify: not compiling correctly but yet not reporting a compile time error, right?


Exactly.  The compiler didn't report any errors, but the compiled kernel is incorrect.

Jeff

0 Likes

The problem persists in SDK 2.2. with Catalyst 10.7b (Linux, driver has a "testing only" watermark in the bottom right of the screen)

It would help me tremendously if you could explain the problem in a bit more detail. Can the problematic part of the kernel be seen in the .il or .isa sources if I dump them (then I could try finding a workaround without freezing and rebooting my machine every 5mins)? Why is the resulting kernel invalid? What could be the reason (use of too much local mem?). 

0 Likes

dschwen,
The required fix for this problem won't be public for 1 or 2 more releases of Catalyst. It fix is part of the driver and not the SDK.
0 Likes

Thanks. I found a workaround. Setting my local array cn to float3 (now available) avoids the freeze.

It triggers another bug (see std::length_error thread I just posted), but that can be worked around too (by keeping kernel arguments float4 and using them as float3 through .xyz)

I'm happy now.

0 Likes