cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

dns_on_gpu
Adept II

Absence of read coalescing

Hello,

I am doing double precision floating point computations on a 280X. According to the AMD Programming

Guide, the SI chips do not do 64 bit read coalescing and I am getting very low vector and scalar unit

occupancy - between 3-4% according to CodeXL also indicating lots of waiting for memory. Is it at all possible

to alleviate this problem?

Thanks.

--

0 Likes
2 Replies
maxdz8
Elite

With occupancy 3% (I didn't even know this was possible) you are going to be extremely slow, read coalescing or not. SI devices don't have it because they don't need it: given appropriate memory access patterns they naturally produce "packed" writes.

You have probably taken a CPU thread and slapped it in a WI. This is not what the WI is supposed to do, especially for complex problems. Check out VGPR usage, SGPR usage, ScratchRegs and ISA size (find this at the end of the disassembly tab).

Thank you.

I wrote "low vector and scalar unit occupancy" to refer to VALUBusy and SALUBusy

which are low - and not kernel occupancy which is ~30%.

I have inserted numerical values  in the array index calculations, and there was a marked

improvement.

--

0 Likes