I did a small test on Firestream. One version code is like this:
kernel void sum2D(float a[][], float b[][], float out c<>)
{
float2 idx = indexof(c);
c = a[idx] + b[idx];
}
sum2D(streama2, streamb2, streamc2);
The other one is:
kernel void sum3D(float a[][][], float b[][][], float out c<>)
{
float3 idx = indexof(c);
c = a[idx] + b[idx];
}
sum3D(streama3, streamb3, streamc3);
For the same problem scale in these two version, I mean the total length of the output stream is the same. The performance of the former one is 3X better than the latter one. It seems that the latter 3D version will also be translated to 2D model. Why does these cause so much performance degradation?
thanks ahead.
Thank you for the reply.
When I compare these two version code in the ShaderAnalysis, all the results are the same. Doesn't the transformation made in kernel code? If not, when does the address transformation happens?
MicahVillmow, does the "custom address translation" means I should declare all the stream to be 2D, and do the transformation in my kernel function?