cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

wgbljl
Journeyman III

why does the computation on 2D stream is much better than 3D and 1D stream?

I did a small test on Firestream. One version code is like this:

kernel void sum2D(float a[][], float b[][], float out c<>)
{
    float2 idx = indexof(c);
    c = a[idx] + b[idx];
}
sum2D(streama2, streamb2, streamc2);

The other one is:

kernel void sum3D(float a[][][], float b[][][], float out c<>)
{
    float3 idx = indexof(c);
    c = a[idx] + b[idx];
}
sum3D(streama3, streamb3, streamc3);

For the same problem scale in these two version, I mean the total length of the output stream is the same. The performance of the former one is 3X better than the latter one. It seems that the latter 3D version will also be translated to 2D model. Why does these cause so much performance degradation?

thanks ahead.

0 Likes
4 Replies
ryta1203
Journeyman III

1. The hardware is optimized for 2D.
2. The brook+ compiler is not very good.

0 Likes

wgbljl,
The hardaware supports 2d memory resources natively. 3D memory resources are address translated in software using a generic algorithm, so there is a performance penalty. You should be able to achieve higher performance by using custom address translation.
0 Likes

Thank you for the reply.

When I compare these two version code in the ShaderAnalysis, all the results are the same. Doesn't the transformation made in kernel code? If not, when does the address transformation happens?

MicahVillmow, does the "custom address translation" means I should declare all the stream to be 2D, and do the transformation in my kernel function?

0 Likes

wgbljl,
That probably be the best way to get highest performance. Since you can setup your transformations to use shifts/ands whereas the generic algorithm always uses divs/mods you can always outperform the built-in address translation code.
0 Likes