    Short vectors

      How short vectors are processed by stream processors


      Can you explain one quite simple thing regarding short vectors (like float4) ?

      I have a very simple kernel:


      kernel void sum(float4 a<>, float4 b<>, out float4 c<>)
      c = a + b;

      And I have Radeon HD 4850 that has 800 stream processors.

      I do not understand how many floats one stream processor can handle at a moment when you use float4. One float4 is 4 floats. Does it mean that one stream processor can handle 4 floats at a moment ?

      And does it mean that HD 4850 can do operations on 800 float4 variables simultaneously ? It means a simultaneous processing of 3200 floats which is a lot. I just cannot believe this and a simultaneous processing of 800 floats looks more credible.

      Can anybody shed some light on this ?

      Best regards,