I have OpenCL code simply doing somthing like this:
__kernel void test(__read_only frame, ..., __global ulong *gOut1, __global uint *gOut2)
for (int i=0; i<5; i++)
if ((get_global_id(0) == 0) && (get_global_id(1) == 0) && (i==1))
*gOut2 = 0; gOut2++;
gOut1[uniq_global_idx] = func(frame, i, getl_global_id(0), get_global_id(1));
It ran in the global scale of the frame size, i.e., the global group is in frame.x and frame.y size. Basically the kernel do some calculation to generate gOut1 result with size 5*frame.x*frame.y.
- the AMD gDebugger is not working - it works for the sample code, and the slightly modified code from sample code, but not anything changed further.
- the TestCode section has to be there for the code to run. When that TestCode is removed, when I ran it, it stuck forever.
- Somehow that 'gOut2++' statement is "crucial", even if it is not needed as the 'if' statement will hit only once, it needs to be there so that the gOut1 result is correct.
- When the exact same function is duplicated with a different name, i.e, kernel test(..) and test1(..) are the same, when I call test(..), it stuck forever.
Anybody can explain what is happening?