I am doing double precision floating point computations on a 280X. According to the AMD Programming
Guide, the SI chips do not do 64 bit read coalescing and I am getting very low vector and scalar unit
occupancy - between 3-4% according to CodeXL also indicating lots of waiting for memory. Is it at all possible
to alleviate this problem?