4 Replies Latest reply on Jun 11, 2012 10:25 PM by sayantandatta

    GPR Usage for private arrays

    sayantandatta

      Suppose I have a private array say:

      uint A[80];

      Now any decent high end AMD GPU (5870/6970/7970) can store this number of uints in its GPRs. But I suppose the GPRs doesn't have any indirect addressing modes(I am not sure though), so is it still possible to store the array in GPRs?  Would it make any difference if the array indeces are random(like A[tmp] where value of tmp is not predictable) or fixed constant(like A[0],A[1]......or something entirely prdictable)? 

       

      Thanks,

      Sayantan

        • Re: GPR Usage for private arrays
          gautam.himanshu

          As per my experience, such arrays are not stored in GPRs, but we never what all optimizations they are planning

          I would suggest you to check your GPR count with and without having this array in the profiler.

          I guess putting something in GPRs or not depends on the compiler and the application it is compiling.

           

          Another suggestion would be to try to use const cache( if the values are constant and known at compile time) .

          1 of 1 people found this helpful
          • Re: GPR Usage for private arrays
            realhet

            There is hw support for indirect access of registers. Check the IL disassembly, there should be a dcl_indexed_temp_array instruction.

            • Re: GPR Usage for private arrays
              notzed

              Only arrays where [the index of] every access is known at compile time can be registerised (although realhet's comment suggests some hardware can support it, but i'd still be surprised if they did).  i.e. only constant indices.

               

              I use local memory for read/write cases where the indices aren't constant, or constant for read-only data.

               

              Usually local memory is required anyway because once you get arrays of that sort of size serial single-thread algorithms that use them get too slow.  There is usually some parallelism that can be exploited and the gains can be extreme.

              1 of 1 people found this helpful