Hi,
using the folloging kernel:
typedef struct my_float3_
{
float x,y,z;
} my_float3;
my_float3 make_my_float3(float3 f)
{
my_float3 mf3;
mf3.x = f.x;
mf3.y = f.y;
mf3.z = f.z;
return mf3;
}
kernel void main(constant my_float3 * arg1, constant my_float3 * arg2, float3 normal, global my_float3 * out, global float3* mean)
{
size_t index = get_global_id(0);
my_float3 f1 = arg1[index];
my_float3 f2 = arg2[index];
float3 result = (float3)(f1.x, f1.y, f1.z) + (float3)(f2.x, f2.y, f2.z) + normal;
out[index] = make_my_float3(result);
barrier(CLK_GLOBAL_MEM_FENCE);
if(get_global_id(0) == get_global_size(0)-1)
{
float3 mean_ = (float3)0;
for(int i = 0; i < get_global_size(0); i++)
{
mean_ += (float3)(out.x, out.y, out.z);
}
mean[0] = mean_ / get_global_size(0);
}
}
I get that mean, when read, returns only the correct value with any device if I use if(get_global_id(0) == get_global_size(0)-1) rather than if(get_global_id(0) == 0). The later works with DEVICE_TYPE_GPU, but I get garbage with CL_DEVICE_TYPE_CPU. Why is that so?
Also, when the globalNDRage to low (e.g. 2,1,1) the behavior also appears on GPU.
I use a RadeonHD 7970 and a Core i7 CPU.
Thnak you
Solved! Go to Solution.
barrier() only synchronizes workitems within 1 workgroup.
You cannot synchronize workgroups. This is by OpenCL Design.
You need to spawn a subsequent kernel to process the output data from wiorkgroups belonging to previous kernel.
HTH
Best Regards,
Bruhaspati
Hi,
I have tested similar example with the same condition like if(get_global_id(0) == get_global_size(0) - 1) and if(get_global_id(0) == 0).
Its working fine with me.
But DEVICE_TYPE_CPU/GPU am not finding .
Please provide us the code which you are using to compile.
barrier() only synchronizes workitems within 1 workgroup.
You cannot synchronize workgroups. This is by OpenCL Design.
You need to spawn a subsequent kernel to process the output data from wiorkgroups belonging to previous kernel.
HTH
Best Regards,
Bruhaspati
I had urgent work to do before I provide you with a runnable sample. But... d'oh! completely forgot that basic fact about barriers. Been a 2-3 years without using opencl. Thank you very much, I understand that I must do another kernel to compute mean.