2 Replies Latest reply on Jul 19, 2010 3:19 PM by MicahVillmow

    Using all 256 registers?

    iya

      Hello,

      I've written a generator for .il code, and would like to use as many registers as possible. My hardware is a 4850.

      As I understand limiting the groupsize to the wavefront size of 64 should be the only requirement, but neither in OpenCL nor in IL was I ever successful of getting the compiler to allocate more than 122 GPRs. A groupsize of 256 can get upto 63.

      Am I forgetting something or is it a current compiler limitation?

      NumWavefrontPerSIMD = 2 seems to be the problem. Is there a way to limit this to 1?

      ; ----------------- CS Data ------------------------ ; Input Semantic Mappings ; No input mappings GprPoolSize = 0 CodeLen = 11808;Bytes PGM_END_CF = 0; words(64 bit) PGM_END_ALU = 0; words(64 bit) PGM_END_FETCH = 0; words(64 bit) MaxScratchRegsNeeded = 3 ;AluPacking = 0.0 ;AluClauses = 0 ;PowerThrottleRate = 0.0 ; texResourceUsage[0] = 0x00000000 ; texResourceUsage[1] = 0x00000000 ; texResourceUsage[2] = 0x00000000 ; texResourceUsage[3] = 0x00000000 ; fetch4ResourceUsage[0] = 0x00000000 ; fetch4ResourceUsage[1] = 0x00000000 ; fetch4ResourceUsage[2] = 0x00000000 ; fetch4ResourceUsage[3] = 0x00000000 ; texSamplerUsage = 0x00000000 ; constBufUsage = 0x00000000 ResourcesAffectAlphaOutput[0] = 0x00000000 ResourcesAffectAlphaOutput[1] = 0x00000000 ResourcesAffectAlphaOutput[2] = 0x00000000 ResourcesAffectAlphaOutput[3] = 0x00000000 ;SQ_PGM_RESOURCES = 0x3000027A SQ_PGM_RESOURCES:NUM_GPRS = 122 SQ_PGM_RESOURCES:STACK_SIZE = 2 SQ_PGM_RESOURCES:FETCH_CACHE_LINES = 0 SQ_PGM_RESOURCES:PRIME_CACHE_ENABLE = 1 ; CS Setup Mode = Fast (i.e setup R0.x) ; NumThreadPerGroup = 64 ; NumWavefrontPerSIMD = 2 ; IsMaxNumWavePerSIMD = true ; SetBufferForNumGroup = false

        • Using all 256 registers?
          the729

          Hi, iya

          AFAIK, although each thread processor has 256 registers, the maximum number of private GPRs that can be used in a thread is 123. This is due to the ISA of the hardware uses only 7-bit for GPR addressing. And (according to the document) at least 4 GPRs are used as cluster temperory registers.

          Therefore, if not limited by the LDS usage, you will get NumWaveFrontPerSIMD > 1.

          • Using all 256 registers?
            MicahVillmow
            In order to get full utilization of the GPU, two wavefronts need to execute in parallel. The compiler thus is limited to allocating half of the registers available for a single wavefront so that at least two wavefronts can always be executed.