Assume there are n work-groups in a kernel, and each work-group works out an array on local memory with the same length(i.e.array0,array1,array2,……,array(n-1), all arrays with the same length and on different local memory), how to add them up to an array in the same kernel ?
I'm afraid that because there is no synchronization with different work-group, it can not insure every array is added up.
Dose async_work_group_copy() work on this problem? If yes ,how to use it ?
Thanks !