4 Replies Latest reply on Aug 5, 2014 5:47 AM by dipak

    convert kernel from local memory to registers


      I have a kernel that uses local memory to calculations.

      In order to make things faster, I would like to use registers instead.


      How do I go about doing this?



        • Re: convert kernel from local memory to registers


          In OpenCL, the local and private memory are marked by Address Space Qualifier __local(or local) and __private (or private) respectively. Any object declared without any address space qualifier is allocated in the generic address space and till OpenCL 1.2, the generic address space name for arguments to a function in a program, or local variables of a function is __private.

          As the bandwidth of private memory (as stored in registers) is faster than local memory (stored in LDS), the conversion from local to private memory can improve the performance. This conversion works fine if the usage of register within the kernel is low or up-to a certain limit. Otherwise this conversion may have negative impact on kernel or over all program performance due to following reasons:

          1) During compilation, the OpenCL compiler tries to map private memory allocations to the pool of registers (GPRs) in the GPU. In the event GPRs are not available, private memory is mapped to the “scratch” region, which has the same performance as global memory. So, the performance may degrade significantly.

          2) There is a limit for number of registers per compute unit and SIMDs depending on GPU architecture (see Appendix D: Device Parameters in AMD APP Programming Guide to know device specific limit). Too many usage of registers can limit the number of active wave-fronts per SIMD and/or CU (see section "Resource Limits on Active Wavefronts" in Chapter 5 and 6 in AMD APP Programming Guide ) and reduce the overall GPU occupancy.