cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

max_bodycad
Journeyman III

barrier and global memory

Hi,

using the folloging kernel:

typedef struct my_float3_

{

  float x,y,z;

} my_float3;

my_float3 make_my_float3(float3 f)

{

  my_float3 mf3;

  mf3.x = f.x;

  mf3.y = f.y;

  mf3.z = f.z;

  return mf3;

}

kernel void main(constant my_float3 * arg1, constant my_float3 * arg2, float3 normal, global my_float3 * out, global float3* mean)

{

  size_t index = get_global_id(0);

  my_float3 f1 = arg1[index];

  my_float3 f2 = arg2[index];

  float3 result = (float3)(f1.x, f1.y, f1.z) + (float3)(f2.x, f2.y, f2.z) + normal;

  out[index] = make_my_float3(result);

  barrier(CLK_GLOBAL_MEM_FENCE);

  if(get_global_id(0) == get_global_size(0)-1)

  {

  float3 mean_ = (float3)0;

  for(int i = 0; i < get_global_size(0); i++)

  {

  mean_ += (float3)(out.x, out.y, out.z);

  }

  mean[0] =  mean_ / get_global_size(0);

  }

}

I get that mean, when read, returns only the correct value with any device if I use  if(get_global_id(0) == get_global_size(0)-1) rather than if(get_global_id(0) == 0). The later works with DEVICE_TYPE_GPU, but  I get garbage with CL_DEVICE_TYPE_CPU. Why is that so?

Also, when the globalNDRage  to low (e.g. 2,1,1) the behavior also appears on GPU.

I use a RadeonHD 7970 and a Core i7 CPU.

Thnak you

0 Likes
1 Solution
himanshu_gautam
Grandmaster


barrier() only synchronizes workitems within 1 workgroup.

You cannot synchronize workgroups. This is by OpenCL Design.

You need to spawn a subsequent kernel to process the output data from wiorkgroups belonging to previous kernel.

HTH

Best Regards,

Bruhaspati

View solution in original post

0 Likes
3 Replies
himanshu_gautam
Grandmaster

Hi,

I have tested similar example with the same condition like if(get_global_id(0) == get_global_size(0) - 1) and if(get_global_id(0) == 0).

Its working fine with me.
But DEVICE_TYPE_CPU/GPU am not finding .

Please provide us the code which you are using to compile.

0 Likes
himanshu_gautam
Grandmaster


barrier() only synchronizes workitems within 1 workgroup.

You cannot synchronize workgroups. This is by OpenCL Design.

You need to spawn a subsequent kernel to process the output data from wiorkgroups belonging to previous kernel.

HTH

Best Regards,

Bruhaspati

0 Likes
max_bodycad
Journeyman III

I had urgent work to do before I provide you with a runnable sample. But... d'oh! completely forgot that basic fact about barriers. Been a 2-3 years without using opencl. Thank you very much, I understand that I must do another kernel to compute mean.

0 Likes