Hello,
The attached OpenCL kernel is supposed to compute a plane rotation of two vectors. For some reasons, it leads to a system freeze under Stream SDK 2.1, while it works correctly under another OpenCL implementation. Does anyone have an idea why this could be the case? Could it be related to a bug in the Stream SDK?
Again, the WARNING: The code below may freeze your system.
Best regards,
Karli
////// plane rotation: (x,y) <- (\alpha x + \beta y, -\beta x + \alpha y) __kernel void plane_rotation( __global float * vec1, __global float * vec2, float alpha, float beta, unsigned int size) { float tmp1 = 0; float tmp2 = 0; for (unsigned int i = get_global_id(0); i < size; i += get_global_size(0)) { tmp1 = vec1; tmp2 = vec2; //the following barrier is not needed, but does not improve the situation. barrier(CLK_GLOBAL_MEM_FENCE); vec1 = alpha * tmp1 + beta * tmp2; vec2 = alpha * tmp2 - beta * tmp1; } };
Which OS/GPU/Driver are you using?
Also could you post the runtime code? A compilable test-case would make it easy to reproduce and track-down the problem.
Well, Murphy's Law just hit me again. For the simple test code, it is now working again - the barrier caused some problems.
However, I remember that I have added the barrier *because* the kernel wasn't working as expected and system freezes occured. Anyway, I will come back to that as soon as our test system is finally up and running...
Are you using a 7xx GPU with group-size larger than 64 for this kernel?
I have seen driver resetting using above combination before. Its always better to query group-size from runtime using function - clGetKernelWorkGroupInfo
Thanks for the input! I have first observed the freezes using a new 64 bit linux kernel (I think it was 2.6.33). Some more days passed by and the test system finally became unavailable, so I've tried it today on a Windows 7, 64 bit, where it worked. The GPU was a Radeon HD 5850, so the group size was not an issue. A linux test system should be back soon...
Yeah, a barrier within a for-loop is certainly a bad thing if it can't be reached by all threads...