I have OpenCL code simply doing somthing like this:
__kernel void test(__read_only frame, ..., __global ulong *gOut1, __global uint *gOut2)
{
for (int i=0; i<5; i++)
{
//Begin TestCode
if ((get_global_id(0) == 0) && (get_global_id(1) == 0) && (i==1))
{
*gOut2 = 0; gOut2++;
}
//End TestCode
gOut1[uniq_global_idx] = func(frame, i, getl_global_id(0), get_global_id(1));
}
}
It ran in the global scale of the frame size, i.e., the global group is in frame.x and frame.y size. Basically the kernel do some calculation to generate gOut1 result with size 5*frame.x*frame.y.
Problems:
- the AMD gDebugger is not working - it works for the sample code, and the slightly modified code from sample code, but not anything changed further.
- the TestCode section has to be there for the code to run. When that TestCode is removed, when I ran it, it stuck forever.
- Somehow that 'gOut2++' statement is "crucial", even if it is not needed as the 'if' statement will hit only once, it needs to be there so that the gOut1 result is correct.
- When the exact same function is duplicated with a different name, i.e, kernel test(..) and test1(..) are the same, when I call test(..), it stuck forever.
Anybody can explain what is happening?
gOut1[uniq_global_idx] .
where did you declare uniq_global_idx?
__kernel void test(__read_only frame, ..., __global ulong *gOut1, __global uint *gOut2)
{
int unique_global_id = (get_global_id(1) * frame.size_x + get_global_id(0)) * 5;
for (int i=0; i<5; i++)
{
//Begin TestCode
if ((get_global_id(0) == 0) && (get_global_id(1) == 0) && (i==1))
{
*gOut2 = 0; gOut2++;
}
//End TestCode
gOut1[uniq_global_idx + i] = func(frame, i, getl_global_id(0), get_global_id(1));
}
}
Where func is a simple function just read a region of frame and do some calculation, returns a value. It does not change any global memory value.
what happened if you comment out "gOut1[uniq_global_idx + i] = func(frame, i, getl_global_id(0), get_global_id(1));"
Is gOut2++ still crucial?
If I replace func() with some simple value, the returned gOut1 value back to CPU are good - whether the test code is there or not does not matter. But if I call func, the test code has to be there for func to be correct. If gOut2++ is missing, the result of func is not correct.
Somehow, I think there is sth. wrong with "gOut1[uniq_global_idx + i] = func(frame, i, getl_global_id(0), get_global_id(1));".
Is the index "uniq_global_idx + i" good for the array gOut1?
I double checked unique_idx+i and put a statement there to check the lower and upper bound of it, if it's out of the bound, I will write some value to gOut2, and I removed the TestCode to replace it with this bound check statement, it can run without an error and the values are all good. The memory in gOut2 is untouched, which means there is no bound problem.
So again it is crucial to have some kind of unrelated code to be there for the gOut1 result to be correct.
And when I first ran the above bound check code, it is taking very long time to compile, and it stuck there when I ran it. I have to stop and start it again and it ran normally. I suspect there is something wrong with the setup of the driver, but don't know what is a good way to check it.
Not sure if it's related, sometimes I see a warning dialogbox:
Here is the software info from catalyst control center:
Driver Packaging Version | 8.961-120405a-137813C-ATI |
Provider | Advanced Micro Devices, Inc. |
2D Driver Version | 8.01.01.1243 |
2D Driver File Path | /REGISTRY/MACHINE/SYSTEM/ControlSet001/Control/CLASS/{4D36E968-E325-11CE-BFC1-08002BE10318}/0006 |
Direct3D Version | 7.14.10.0903 |
OpenGL Version | 6.14.10.11631 |
Catalyst Control Center Version | 2012.0405.2205.37728 |
AMD Audio Driver Version | 7.12.0.7706 |
The AMD driver version is 8.961.0.0
Updating the driver to the latest version may fix the problem if it is driver related.