I have tested similar example with the same condition like if(get_global_id(0) == get_global_size(0) - 1) and if(get_global_id(0) == 0).
Its working fine with me.
But DEVICE_TYPE_CPU/GPU am not finding .
Please provide us the code which you are using to compile.
barrier() only synchronizes workitems within 1 workgroup.
You cannot synchronize workgroups. This is by OpenCL Design.
You need to spawn a subsequent kernel to process the output data from wiorkgroups belonging to previous kernel.
I had urgent work to do before I provide you with a runnable sample. But... d'oh! completely forgot that basic fact about barriers. Been a 2-3 years without using opencl. Thank you very much, I understand that I must do another kernel to compute mean.