cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

noxnet
Journeyman III

ATI GPUs / 5D Shader Units

Need help for my Master Thesis

I'm currently working on my Master Thesis and i need some help. I'm writing about possible accelerations using OpenCL.

Now I want to describe ATI Stream GPUs and Nvidia Cuda Cores. But I'm not sure if I really got it right.

ATI GPUs:

A SIMD Core contains 16 Stream Processors (SP) which has 4 Stream Processing Units (SPU) + 1 SFU + Branch Unit + some General Purpose Registers. The SFU can also act like normal SPU. Therefore a Cypress GPU has 5 SPUs * 16 * 20 = 1600 SPUs.

Theoretical FLOPS:

FLOPS (SP) = cores * 2 (FMA) * GHz

FLOPS (DP) = cores/5*2 (2 SPUs = 1DP) * GHz

Is FMA at DP possible? because this would mean that FLOPS(DP) has to be multiplied by 2.

5D-Shader-Unit:

Why 5D-Shader? I read that this comes from the GPUs original purpose (graphics visualisation) and it is due to 5D-Vectors (conatining color values RGBA and ?) needed for this. Is this ture?

Instruction Level Parallelism:

It depends on the compiler to optimize code for 5D-Shader-Units. But if there are more independent instructions within a kernel more SPUs can be used. Can the usage of 5D-Vectors within the kernel improve performance?

Eg. consider a kernel that with the following instructions.

A = 1+1; B = 1+1; C = 1+1; D = 1+1; E = 1+1;

As far as i understand this is optimal for 5D-Units because the instructions are independent from each other so the can be executed at 1 CPU cycle by a SP. On a Cuda Core this instructions need 5 CPU cycles.

If the kernel looks like this:

A = 1+1; B = A+1; C = B+1; ...

Each SP can only execute 1 instrucion per CPU cycle.

I got the most information from http://www.anandtech.com/print/2556

0 Likes
3 Replies
noxnet
Journeyman III

I draw the ATI Cypress SIMD Core Architecture.

An image can be found at

http://noxnet.at/wp-content/ati_cypress_simd_core2.jpg

Could anyone please look at it and correct me if something is wrong?

0 Likes

http://sa09.idav.ucdavis.edu/docs/SA09_AMD_IHV.pdf

one 5D core can do 2 DP ADD and one FMA DP.

another reason why 5D is that most instruction in shader are simple ADD, MUL.

0 Likes
hazeman
Adept II

Originally posted by: noxnet Why 5D-Shader? I read that this comes from the GPUs original purpose (graphics visualisation) and it is due to 5D-Vectors (conatining color values RGBA and ?) needed for this. Is this ture?


I think it was only one of the reasons behind this choice. IMHO the reasoning looked like RGBA + special function ( sin,cos, ... ).

But the real reason behind the choice is more complex. In general processing unit can be divided into 2 parts: execution control and computing units. Engineers have limited amout of transistors at their disposal, which have to be splitted into this 2 categories.

In typical CPU most of the transistors goes into execution control ( instruction decoding, branch prediction, out of order execution .... ) and computing units take rather small area.

In the case of gpus most of the transistors is used for computing units. But there are also variantions. One the one side there is nvidia with more advance control unit ( simplifying - one "control unit" per "one scalar" ) on the other hand there is ATI with 5 CU per thread. So in the case of ATI there is less transistors used for execution control and more for computations.

Such a choice has some advantages - 5870 has 2/3 fermi transistors and almost twice the peak computational power. But on the other hand it's much harder to write kernel which uses all 5xxx resources. Depending on the problem ATI GPU usage can be as low as 20-30% or as high as 95%.

For Nvidia which has more execution control transistors it's much easier to write kernel achieving high gpu utilisation ( but for the cost of lower peek flops ).

 

 

 

0 Likes