Memory access performance for 2D streams

Discussion created by Raistmer on Jul 18, 2009
I need horizontal addition of 2 float 4 elements to form element of output stream.
Is there some difference in performance what index will be used for access to sequental elements?

I.e.: will 1) and 2) differ in performance?

1) o.xy=inp[tID][i].xz+inp[tID][i].yw; o.zw=inp[tID][i+1].xz+inp[tID][i+1].yw; 2) o.xy=inp[i][tID].xz+inp[i][tID].yw; o.zw=inp[i+1][tID].xz+inp[i+1][tID].yw; with kernel void k(float4 inp[][], out float4 o<>);