5 Replies Latest reply on Apr 22, 2010 4:55 PM by MicahVillmow

    Is 'Evergreen (GPU family)' pure SIMD processors

    mahadevan

      Hi Gurus,

      I have one doubt. Is the HD5750(or any HD5xxx series GPUs) are pure SIMD processors.

      Typically a SIMD has following definition (taken from OpenCL specifications  document):

      SIMD: Single Instruction Multiple Data. A programming model where a kernel is executed
      concurrently on multiple processing elements each with its own data and a shared program
      counter. All processing elements execute a strictly identical set of instructions.

       

      That means all the cores will share same program counter. Is that right in the HD5xxx series GPUs?

      If all the cores share the same program-counter, how is is possible to run some if-branch code (with varying data input to each core) on the GPU?

       

      thanks & regards

      mahadevan

       

       

       

       

       

        • Is 'Evergreen (GPU family)' pure SIMD processors
          Fr4nz

           

          Originally posted by: mahadevan Hi Gurus,

          I have one doubt. Is the HD5750(or any HD5xxx series GPUs) are pure SIMD processors.

          Typically a SIMD has following definition (taken from OpenCL specifications  document):

          SIMD: Single Instruction Multiple Data. A programming model where a kernel is executed concurrently on multiple processing elements each with its own data and a shared program counter. All processing elements execute a strictly identical set of instructions.

          That means all the cores will share same program counter. Is that right in the HD5xxx series GPUs?



          Correct: in 5xxx SIMD cores are grouped into SIMD engines (termed "processing elements" in OpenCL), and every SIMD engine has its own P.C. shared among its SIMD cores (maybe things are actually more complicated at the hardware level, but the general idea you have to keep in mind is this).

           

           

          If all the cores share the same program-counter, how is is possible to run some if-branch code (with varying data input to each core) on the GPU?


          In theory, if a divergence (regarding the execution flow) is spotted inside a thread group (so, inside a SIMD engine) after a control flow instruction, then the hardware will execute sequentially both possibile branches for every thread of the group; after this, for every thread will be considered only the "valid" path (i.e., the one which respects the  condition imposed by the control flow instruction).

          This is the main reason behind the fact that, when you program on GPUs, it's very important to avoid control flow instructions which tend to diverge the execution flow among threads many times, whenever possibile. Another important thing is to unroll loops, if possible.

          Obviously these are not easy (nor realistic, sometimes) tasks most of the times, but can give you dramatic performance improvements.

            • Is 'Evergreen (GPU family)' pure SIMD processors
              mahadevan

              Fr4nz/nou,

              Thanks for your replies.

               

              ..

              This is the main reason behind the fact that, when you program on GPUs, it's very important to avoid control flow instructions which tend to diverge the execution flow among threads many times, whenever possibile. Another important thing is to unroll loops, if possible. Obviously these are not easy (nor realistic, sometimes) tasks most of the times, but can give you dramatic performance improvements.

               

              What if code contains too many control flow statements and loops so that there is large divergence of PC across each? -- assuming that these loops/control-flow can't be removed due to complex algorithm.

              Will cores execute in above conditions(with bad/worse performance) or will they stop executing?

               



               5xxx GPU have multiple SIMD cores. for example 5870 have 20 SIMD core. each core have 16 5D units.

              if is coherence betwen branch on oneSIMD core then it execute only this branch. if not then it execute both branch and pick up right result.

               How many SIMD are in in HD5750?

              How can one map the data from  [http]http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units[/http] to SIMD, cores, etc?

               

              AMD specifications docs only say about (for HD5750):

              720 Stream Processing Units
              36 Texture Units
              64 Z/Stencil ROP Units
              16 Color ROP Units

              How can I translate them to SIMD core,SIMD engines?

              mahadevan

               

              thanks & regards

               

            • Is 'Evergreen (GPU family)' pure SIMD processors
              nou

              5xxx GPU have multiple SIMD cores. for example 5870 have 20 SIMD core. each core have 16 5D units.

              if is coherence betwen branch on oneSIMD core then it execute only this branch. if not then it execute both branch and pick up right result.

              • Is 'Evergreen (GPU family)' pure SIMD processors
                MicahVillmow
                mahadevan,
                If the case of complex control flow, the worst case scenario is every instruction is executed by every thread.