cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

koarl0815
Journeyman III

Simple OpenCL kernel freezes system with Stream SDK 2.1

Hello,

The attached OpenCL kernel is supposed to compute a plane rotation of two vectors. For some reasons, it leads to a system freeze under Stream SDK 2.1, while it works correctly under another OpenCL implementation. Does anyone have an idea why this could be the case? Could it be related to a bug in the Stream SDK?

Again, the WARNING: The code below may freeze your system.

Best regards,
Karli

 

////// plane rotation: (x,y) <- (\alpha x + \beta y, -\beta x + \alpha y) __kernel void plane_rotation( __global float * vec1, __global float * vec2, float alpha, float beta, unsigned int size) { float tmp1 = 0; float tmp2 = 0; for (unsigned int i = get_global_id(0); i < size; i += get_global_size(0)) { tmp1 = vec1; tmp2 = vec2; //the following barrier is not needed, but does not improve the situation. barrier(CLK_GLOBAL_MEM_FENCE); vec1 = alpha * tmp1 + beta * tmp2; vec2 = alpha * tmp2 - beta * tmp1; } };

0 Likes
6 Replies
omkaranathan
Adept I

Which OS/GPU/Driver are you using? 

Also could you post the runtime code? A compilable test-case would make it easy to reproduce and track-down the problem.

0 Likes

Well, Murphy's Law just hit me again. For the simple test code, it is now working again - the barrier caused some problems.

However, I remember that I have added the barrier *because* the kernel wasn't working as expected and system freezes occured. Anyway, I will come back to that as soon as our test system is finally up and running...

0 Likes

Are you using a 7xx GPU with group-size larger than 64 for this kernel?

I have seen driver resetting using above combination before. Its always better to query group-size from runtime using function - clGetKernelWorkGroupInfo

0 Likes

Thanks for the input! I have first observed the freezes using a new 64 bit linux kernel (I think it was 2.6.33). Some more days passed by and the test system finally became unavailable, so I've tried it today on a Windows 7, 64 bit, where it worked. The GPU was a Radeon HD 5850, so the group size was not an issue. A linux test system should be back soon...

0 Likes

for (unsigned int i = get_global_id(0); i < size; i += get_global_size(0))
barrier()
This is illegal in OpenCL. If a barrier is inside control flow, then every thread must hit the barrier on each iteration of the loop. Because the loop count is different for each thread, the result are undefined, and in the case of running on our 5XXX GPU's, your kernel deadlocks the hardware.
0 Likes

Yeah, a barrier within a for-loop is certainly a bad thing if it can't be reached by all threads...

0 Likes