3 Replies Latest reply on Apr 23, 2014 1:38 AM by ekondis

    Controling scalar and vector register allocation

    ekondis

      Is there any way to control via OpenCL which variables will allocate scalar instead of vector registers on GCN devices? Using expressions like get_global_id(0)/64, which should contain the same value for all workitems within a wavefront, seem to be allocated on vector registers.

        • Re: Controling scalar and vector register allocation
          sudarshan

          OpenCL programmers do not have a direct control over which registers will be used for _private allocations. vGPRs are used for private memory as sGPRs are reserved for internal purposes.

           

          Proper (and also portable) way to share a value across a wave-front is to put it in local memory with _local qualifier, and leave lower level tools (compiler/linker/driver) take care of their mapping to exact hardware resources.

          1 of 1 people found this helpful
          • Re: Controling scalar and vector register allocation
            gopal

            Hi,

             

            Register allocation is the job of compiler. During the compilation, the compiler decides how to allocate these variables to small, finite set of registers with aim to allocate as many variables to register as possible. The compiler tries to map private memory allocations to the pool of GPRs in the GPU. In the event GPRs are not available, private memory is mapped to the “scratch” region, GPRs have some restrictions about which register ports can be read on each cycle; but generally, these are not exposed to the OpenCL programmer.

             

            In GCN devices, there are two types of GPRs: scalar GPRs (SGPRs) and vector GPRs (VGPRs). Each CU has four vector units and one scalar unit and each vector unit has its own SGPR and VGPR pool. There are 512 SGPRs and 256 VGPRs per vector unit. The vector unit handles all vector instructions (any instruction that is handled per thread). And SGPRs are used for scalar instructions: any instruction that is executed once per wavefront, such as a branch, a scalar ALU instruction and constant cache fetches. SGPRs are also used for constants, all buffer/texture definitions, and sampler definitions; some kernel arguments are stored, at least temporarily, in SGPRs.

             

            So if a programmer wants to use scalar registers instead of vector registers for variables, he should try to use scalar instructions which can be coded using branches, computation on constant memory and etc as mentioned above.

            1 of 1 people found this helpful