Understanding scatter streams

Discussion created by FangQ on Feb 19, 2009
will the input stream or output stream determines the total threads for a scatter operation

I wrote a kernel with scatter output, something like the following. I am trying to understand what is going to happen when running this kernel:

kernel scatterkernel(float val<>, float2 pos<>, out float2 field[][]){



for example, my val<> and pos<> streams are both 1D streams with 100 elements, and field[][] is a 2D stream with 1024x1024 elements. I want to distribute the values in val into the field texture based on the positions in pos.

will the above code loop over the dimensions of the input streams (i.e. 100), or will it loop over the output stream dimension (i.e. 1024x1024)?

is the above scatter kernel as efficient as gather operation, or has significant performance penalties as a result of non-coalescent memory access pattern?