I noticed that it is only possible to use a barrier to synchronize access to local memory. Is it possible to synchronize access to global memory without the need to launch a new kernel?
short answer is no. ther is no defined way to make global synchronization. you can experiment with global atomic but you can easily end up with locked GPU or incorect results.
Thanks, this means that barrier works only if the kernel is small enough to execute localy? Which parameter returns the number of work items that can be executed (before kernel is enqued) so that kernel execution remains local?
barriers only work inside a work group. You can divide any large problem into workgroups( size can determined by clGetKernelWorkGroupInfo) and then can acheive synchronization among each workgroup using barriers.
Retrieving data ...