Discussion created by Poozon on Aug 26, 2008
Latest reply on Aug 29, 2008 by michael.chu
How short vectors are processed by stream processors


Can you explain one quite simple thing regarding short vectors (like float4) ?

I have a very simple kernel:


kernel void sum(float4 a<>, float4 b<>, out float4 c<>)
c = a + b;

And I have Radeon HD 4850 that has 800 stream processors.

I do not understand how many floats one stream processor can handle at a moment when you use float4. One float4 is 4 floats. Does it mean that one stream processor can handle 4 floats at a moment ?

And does it mean that HD 4850 can do operations on 800 float4 variables simultaneously ? It means a simultaneous processing of 3200 floats which is a lot. I just cannot believe this and a simultaneous processing of 800 floats looks more credible.

Can anybody shed some light on this ?

