cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Poozon
Journeyman III

Short vectors

How short vectors are processed by stream processors

Hello,

Can you explain one quite simple thing regarding short vectors (like float4) ?

I have a very simple kernel:

kernel void sum(float4 a<>, float4 b<>, out float4 c<>)
{
c = a + b;
}

And I have Radeon HD 4850 that has 800 stream processors.

I do not understand how many floats one stream processor can handle at a moment when you use float4. One float4 is 4 floats. Does it mean that one stream processor can handle 4 floats at a moment ?

And does it mean that HD 4850 can do operations on 800 float4 variables simultaneously ? It means a simultaneous processing of 3200 floats which is a lot. I just cannot believe this and a simultaneous processing of 800 floats looks more credible.

Can anybody shed some light on this ?

Best regards,

Poozon

0 Likes
3 Replies
Ceq
Journeyman III

That 800 processing elements are grouped in SPs, each one composed by 5 units (four simple and one complex unit), so there are 160 SPs in total.
Anandtech has a good article about RV770 architecture: http://www.anandtech.com/video/showdoc.aspx?i=3341&p=3
For more information you can also have a look at your Brook+ documentation directory: "docs/HWGuide.doc"
0 Likes
bjang
Journeyman III

Poozon,

I think 800 stream processors can operate on a float, not float4. It is vector machine and each component is already included in the number 800.

0 Likes

Yep, bjang has got it correct.

At a peak, the 4850s and 4870s can be performing 800 floating point (32-bit) operations at the same time.

If you take a look at page 10 of http://ati.amd.com/technology/...Computing_Overview.pdf Figure 8, then it is the 800 stream cores that are performing all of the floating point operations.

Michael.
0 Likes