2 Replies Latest reply on Apr 15, 2015 10:47 AM by dns.on.gpu

    Absence of read coalescing




      I am doing double precision floating point computations on a 280X. According to the AMD Programming

      Guide, the SI chips do not do 64 bit read coalescing and I am getting very low vector and scalar unit

      occupancy - between 3-4% according to CodeXL also indicating lots of waiting for memory. Is it at all possible

      to alleviate this problem?




        • Re: Absence of read coalescing

          With occupancy 3% (I didn't even know this was possible) you are going to be extremely slow, read coalescing or not. SI devices don't have it because they don't need it: given appropriate memory access patterns they naturally produce "packed" writes.


          You have probably taken a CPU thread and slapped it in a WI. This is not what the WI is supposed to do, especially for complex problems. Check out VGPR usage, SGPR usage, ScratchRegs and ISA size (find this at the end of the disassembly tab).

          1 of 1 people found this helpful
            • Re: Absence of read coalescing

              Thank you.


              I wrote "low vector and scalar unit occupancy" to refer to VALUBusy and SALUBusy

              which are low - and not kernel occupancy which is ~30%.


              I have inserted numerical values  in the array index calculations, and there was a marked