cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

mahadevan
Journeyman III

Is 'Evergreen (GPU family)' pure SIMD processors

Hi Gurus,

I have one doubt. Is the HD5750(or any HD5xxx series GPUs) are pure SIMD processors.

Typically a SIMD has following definition (taken from OpenCL specifications  document):

SIMD: Single Instruction Multiple Data. A programming model where a kernel is executed
concurrently on multiple processing elements each with its own data and a shared program
counter. All processing elements execute a strictly identical set of instructions.

 

That means all the cores will share same program counter. Is that right in the HD5xxx series GPUs?

If all the cores share the same program-counter, how is is possible to run some if-branch code (with varying data input to each core) on the GPU?

 

thanks & regards

mahadevan

 

 

 

 

 

0 Likes
5 Replies
Fr4nz
Journeyman III

Originally posted by: mahadevan Hi Gurus,

I have one doubt. Is the HD5750(or any HD5xxx series GPUs) are pure SIMD processors.

Typically a SIMD has following definition (taken from OpenCL specifications  document):

SIMD: Single Instruction Multiple Data. A programming model where a kernel is executed concurrently on multiple processing elements each with its own data and a shared program counter. All processing elements execute a strictly identical set of instructions.

That means all the cores will share same program counter. Is that right in the HD5xxx series GPUs?



Correct: in 5xxx SIMD cores are grouped into SIMD engines (termed "processing elements" in OpenCL), and every SIMD engine has its own P.C. shared among its SIMD cores (maybe things are actually more complicated at the hardware level, but the general idea you have to keep in mind is this).

 

If all the cores share the same program-counter, how is is possible to run some if-branch code (with varying data input to each core) on the GPU?


In theory, if a divergence (regarding the execution flow) is spotted inside a thread group (so, inside a SIMD engine) after a control flow instruction, then the hardware will execute sequentially both possibile branches for every thread of the group; after this, for every thread will be considered only the "valid" path (i.e., the one which respects the  condition imposed by the control flow instruction).

This is the main reason behind the fact that, when you program on GPUs, it's very important to avoid control flow instructions which tend to diverge the execution flow among threads many times, whenever possibile. Another important thing is to unroll loops, if possible.

Obviously these are not easy (nor realistic, sometimes) tasks most of the times, but can give you dramatic performance improvements.

0 Likes

Fr4nz/nou,

Thanks for your replies.

..

This is the main reason behind the fact that, when you program on GPUs, it's very important to avoid control flow instructions which tend to diverge the execution flow among threads many times, whenever possibile. Another important thing is to unroll loops, if possible. Obviously these are not easy (nor realistic, sometimes) tasks most of the times, but can give you dramatic performance improvements.

 

What if code contains too many control flow statements and loops so that there is large divergence of PC across each? -- assuming that these loops/control-flow can't be removed due to complex algorithm.

Will cores execute in above conditions(with bad/worse performance) or will they stop executing?

 



 5xxx GPU have multiple SIMD cores. for example 5870 have 20 SIMD core. each core have 16 5D units.

if is coherence betwen branch on oneSIMD core then it execute only this branch. if not then it execute both branch and pick up right result.

 How many SIMD are in in HD5750?

How can one map the data from  [http]http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units[/http] to SIMD, cores, etc?

 

AMD specifications docs only say about (for HD5750):

720 Stream Processing Units
36 Texture Units
64 Z/Stencil ROP Units
16 Color ROP Units

How can I translate them to SIMD core,SIMD engines?

mahadevan

 

thanks & regards

 

0 Likes

SIMD core count is reported as compute units in OpenCL. IIRC 5770 have 10 SIMD core and 5750 9. 9 SIMD * 5D units *16 units per SIMD = 720.

0 Likes
nou
Exemplar

5xxx GPU have multiple SIMD cores. for example 5870 have 20 SIMD core. each core have 16 5D units.

if is coherence betwen branch on oneSIMD core then it execute only this branch. if not then it execute both branch and pick up right result.

0 Likes

mahadevan,
If the case of complex control flow, the worst case scenario is every instruction is executed by every thread.
0 Likes