cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

Can conditional write into scatter stream speedup kernel?

Or better to eliminate if() at all ?

I need to store value in GPU memory only if some condition (pretty rare one) is true.
In other cases (most probably) computed value in register can be discarded w/o storing it in scatter stream.
Will if() condition speedup kernel (cause memory operation required only in small part of all threads) or it will just slowdown kernel cause memory write always will be no matter what condition value is?

And how to speedup such writes (rare writes in big scatter array) ?

if(was_signal>0){ dest[threadID][0]=o11; }

0 Likes
4 Replies
Raistmer
Adept II

How much costly this one:
04 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R0.x], R1, ELEM_SIZE(3)
over this one:
06 EXP_DONE: PIX0, R2

?
0 Likes

The branch granularity in current ATI GPUs is wavefront (64 threads). If any thread goes inside if condition, all other threads of the wavefront must execute the complete if branch. One way to speed-up if conditions is by making sure that there is no branch divergence within wavefront.

0 Likes

Originally posted by: gaurav.garg

The branch granularity in current ATI GPUs is wavefront (64 threads). If any thread goes inside if condition, all other threads of the wavefront must execute the complete if branch. One way to speed-up if conditions is by making sure that there is no branch divergence within wavefront.



Thanks, that's just I wanted to know.
I was able to re-write kernet to not to use scatter stream and using 2 ordinary streams instead scatter and ordinary ones.

This immediately speeded up kernel by 2 fold (!).
What if I will use if() condition on/off output to one of ordinary streams?
I see no differencies in generated output in SKA.
Does it mean such if statement is completely ignored and ordinary stream always get some value?
0 Likes

Does it mean such if statement is completely ignored and ordinary stream always get some value?


Ordinary stream can be initialized with any random value or they might be uninitialized.

This immediately speeded up kernel by 2 fold (!).


One way to get better performance with scatter stream is to use 1D stream.

0 Likes