# ATI GPUs / 5D Shader Units

Discussion created by noxnet on May 4, 2010
Latest reply on May 4, 2010 by hazeman
Need help for my Master Thesis

I'm currently working on my Master Thesis and i need some help. I'm writing about possible accelerations using OpenCL.

Now I want to describe ATI Stream GPUs and Nvidia Cuda Cores. But I'm not sure if I really got it right.

ATI GPUs:

A SIMD Core contains 16 Stream Processors (SP) which has 4 Stream Processing Units (SPU) + 1 SFU + Branch Unit + some General Purpose Registers. The SFU can also act like normal SPU. Therefore a Cypress GPU has 5 SPUs * 16 * 20 = 1600 SPUs.

Theoretical FLOPS:

FLOPS (SP) = cores * 2 (FMA) * GHz

FLOPS (DP) = cores/5*2 (2 SPUs = 1DP) * GHz

Is FMA at DP possible? because this would mean that FLOPS(DP) has to be multiplied by 2.

Why 5D-Shader? I read that this comes from the GPUs original purpose (graphics visualisation) and it is due to 5D-Vectors (conatining color values RGBA and ?) needed for this. Is this ture?

Instruction Level Parallelism:

It depends on the compiler to optimize code for 5D-Shader-Units. But if there are more independent instructions within a kernel more SPUs can be used. Can the usage of 5D-Vectors within the kernel improve performance?

Eg. consider a kernel that with the following instructions.

A = 1+1; B = 1+1; C = 1+1; D = 1+1; E = 1+1;

As far as i understand this is optimal for 5D-Units because the instructions are independent from each other so the can be executed at 1 CPU cycle by a SP. On a Cuda Core this instructions need 5 CPU cycles.

If the kernel looks like this:

A = 1+1; B = A+1; C = B+1; ...

Each SP can only execute 1 instrucion per CPU cycle.

I got the most information from http://www.anandtech.com/print/2556