11 Replies Latest reply on Jun 21, 2010 3:25 PM by MicahVillmow

    SPMD on 5770

    smatovic
      Does the Processing Elements have their own instruction counter?

      Heyho,

       

      i developed on an NV 8800GT and had to realize that every if-else doubled the runtime.

       

      What about the ATI 5770, does it provide "real" SPMD with own instruction counters, so one process executes the if and the other executes the else only?

       

      The OpenCL documentation mentions only that if the local work group size is set to 1 than the gpu should work in SPMD "mode".

       

      Regards,

      Srdja

        • SPMD on 5770
          nou

          no. on ATI gpu it execute both branch of if/else if there is not coherency across whole workgroup.

            • SPMD on 5770
              smatovic

              do you mean if i put all threads into one work-group than they have their own instruction counters and do not execute every condition?

                • SPMD on 5770
                  LeeHowes

                  Like the G80, Fermi, Larrabee or an SSE CPU, it is a SIMD machine. That's how you get a high density of compute units and hence high arithmetic throughput in hardware. It executes one instruction across the entire wavefront at a time. Even if (as on Fermi, I think) they have their own instruction counters, this is still the case. I can't imagine that changing at any point in the near future given that the trend (look at AVX and Larrabee native instructions) is for wider SIMD, not narrower.

                  It is SPMD, but it's still SIMD within a wave. The waves can go out of sync, as each has its own program counter, and the hardware will interleave waves from multiple workgroups (or multiple kernels in DX and hopefully OpenCL soon) and by running a different group/kernel on a different core.

                    • SPMD on 5770
                      hazeman

                      In this thread there is small discussion about gpu threading. Maybe it will help.

                       

                       

                        • SPMD on 5770
                          smatovic

                          Thanks for the link.

                          All work-items inside a workgroup are working SIMD like and these SIMD Units work autonomous lilke SPMD. Right?

                            • SPMD on 5770
                              LeeHowes

                              Not quite. Only a single wavefront is really SIMD (and even then it's only logically SIMD). A work group is made of multiple wavefronts, each of which has its own program counter and can drift apart in SPMD fashion - the only thing is that within a workgroup you can synchronise on barriers and exchange data in LDS between wavefronts. Once you hit a barrier you have pulled your wavefronts close together again and hence it is a bit like being SIMD across an entire workgroup. It need not be the case if you never barrier, though.

                              Different work groups from the same kernel are then SPMD whether on the same SIMD core (where it's only logical of course, they're not executing on the same cycle) or multiple cores. Different work groups from different kernels are not even necessarily SPMD across the device.

                                • SPMD on 5770
                                  moozoo

                                  Does each SIMD have a program stack?

                                  i.e. to push/pop a return address when calling a subroutine/function.

                                  Larrabee, having Pentium P54C processors obviously does.

                                  But I had the impression that ATI 5xxx and Fermi do not and that all function calls are inlined.

                                   

                                   

                                  • SPMD on 5770
                                    smatovic

                                    >> Not quite.

                                     

                                    So one Wavefron executes 16 Threads in one cycle....and these 16 Threads are SIMD?

                                      • SPMD on 5770
                                        LeeHowes

                                        One wavefront executes 16 work items every second cycle. Each individual set of 16 items is directly SIMD. The entire wavefront is logically SIMD in that the same instruction will be issued 4 times to cover the entire wavefront. Hence branch divergence is at the wavefront level, not at the physical SIMD level.

                          • SPMD on 5770
                            MicahVillmow
                            All functions are inlined most of the time, however, AMD hardware can support function calls but doing so is a fairly large performance hit and thus not preferable to inlining everything.