2 Replies Latest reply on Jan 28, 2014 3:02 AM by sszymczy

    Performance penalty of mixing vector and scalar computation on GCN

    sszymczy

      In my kernel I have vector and scalar computation intermixed like that:

       

      ...

      vector operations on whole vectors (eg. int4)

      scalar operations on individual vector elements (int4.x, int4.y, etc)

      vector operation on whole vectors

      scalar operations on individual vector elements

      vector operation on whole vectors

      ...

       

      I wonder if there is any performance penalty when doing scalar operations on vector elements compared with situation when only scalar variables are used. Does it take any time to extract vector element? Does it help if I copy vector elements to scalar variables first?

        • Re: Performance penalty of mixing vector and scalar computation on GCN
          realhet

          On GCN the physical vector type is not int4, it's int64

          Scalar instructions aren't working on individual vector elements, they have a separate 64 bit register space on which they work separated from (and paralell with) the vector alu.

          There are instructions to extracts a specific element from a vector register into a scalar reg: v_readlane_b32, v_readfirstlane_b32. They eat 1 cycle.

           

          "Does it help if I copy vector elements to scalar variables first?"

          Why? The vector does 64x much operations than the scalar alu. Scalar is there for program control, address calculation, for the calculation of some temporary results that are common to all the 64lane wavefront, and also for some miscellaneous things.