bursting global reads and global memory bandwidth?

Discussion created by sgratton on Jul 5, 2008
Latest reply on Oct 10, 2008 by sgratton

Hi there,

Does anybody know if the hardware is able to "burst" global memory reads as well as writes (if this is a meaningful idea) and if so how to write IL to do this? My

mov r20,g[r1.x]
mov r21,g[r1.x+1]
mov r22,g[r1.x+2]
mov r23,g[r1.x+3]

seems to generate 4 MEM_GLOBAL_READ_IND gpuisa instructions, whereas the code with the src/dst's interchanged generate 1 MEM_GLOBAL_WRITE_IND with a BRSTCNT(3). I am concerned about memory bandwidth.

Relatedly, can I check that the theoretical memory bandwidth of a 3870 say is about 70GB/s? Is "all" of this accessible for any of global buffer reads only, writes only or read and writes together? If not I am worried that any code I write using mainly a global buffer will be doomed to be slow from the start, especially as some of the SDK examples seem to give numbers of order only 9GB/s (e.g. bursting_IL). Or will this change for the new cards?

Are there any other tips one can give for achieving maximum global buffer bandwidth? (One thing I have mooted for example is having "tall and thin" domains, e.g. (2,512), so that if a buffer is basically accessed by vObjIndex0.x each quad should be accessing sequential memory. I haven't had chance to test this in any way - does it make sense though and might it help?)

Thanks a lot,