if(was_signal>0){ dest[threadID][0]=o11; }
The branch granularity in current ATI GPUs is wavefront (64 threads). If any thread goes inside if condition, all other threads of the wavefront must execute the complete if branch. One way to speed-up if conditions is by making sure that there is no branch divergence within wavefront.
Originally posted by: gaurav.garg
The branch granularity in current ATI GPUs is wavefront (64 threads). If any thread goes inside if condition, all other threads of the wavefront must execute the complete if branch. One way to speed-up if conditions is by making sure that there is no branch divergence within wavefront.
Does it mean such if statement is completely ignored and ordinary stream always get some value?
Ordinary stream can be initialized with any random value or they might be uninitialized.
This immediately speeded up kernel by 2 fold (!).
One way to get better performance with scatter stream is to use 1D stream.