Raistmer

Question about "min" example from chapter 1, APP manual

Discussion created by Raistmer on Sep 10, 2011
Latest reply on Sep 10, 2011 by LeeHowes
performance-related


In chapter 1 of APP OpenCL programming guide min example listed.
Atomic operations based reduction is used there. For global memory reduction too.
The question is, why such approach is better than just to make reduction in first thread in group?
That is, to use:
if(get_local_id(0)==0){
for(...){
//reading from global memory here
}
//writing reduced min value
}

This allow to read each global memory location only one time and write to global memory only 1 time per group.
In atomics-based approach there should be global memory write per each workitem in group.

For local memory based reduction why atomis-based approach is better?
It allows to greatly reduce local memory usage (only single local memory variable required instead of
number of workitems per workgroup if reduction via thirst workitem in group would be used). But will this
increase kernel performance, if there is no constrains in available local memory amount?

Outcomes