1 Reply Latest reply on Mar 30, 2009 2:14 PM by MicahVillmow

    Questions about R700 ISA


      Based on the document posted on another topic: http://forums.amd.com/devforum/messageview.cfm?catid=328&threadid=111092&enterthread=y


      1) Is the read throughput of LDS and global memory the same? I'm asking because, if so, I can give up on trying to use LDS for broadcasting...


      2) Is possible to use burst reads and broadcast directly from global memory? Or it is only from LDS?


      3) What's the throughput of the burst mode?


      4) Can Brook+ or CAL generate non-water-fall or broadcast read with a "wavefront-id" index? If not, this is a feature request.


      5) In your opinion, if I need broadcast in a pattern similar to matrix-multiplication does loading data to several registers and then each thread selecting onde of them works?


      Thank you in advance.


        • Questions about R700 ISA

          1) LDS is on-chip and global is off-chip, so the throughput is different. 

          2) Neither global memory or LDS currently burst reads correctly. This is a SW issue and should be fixed in a future driver update.

          3) The peak bandwidth of Global/LDS is card specific and the peaks can be tested with cal sample export_burst_perf(for Global) and ldsread/ldswrite in the samples/runtime directory of the CAL SDK.

          4) This could be possible but is not currently supported. The only current way to turn of waterfall is to use _neighborExch flag, but this does a 4x4 transpose on reads.

          5) There are certain applications that using LDS is beneficial and some that using LDS is not beneficial. The best way to determine this is to use the simple performance samples to see the peaks you will get for your card and determine which way is optimal. For example, if you are reading/writing 4 sequential float4's, then using global can get almost peak bandwidth and there is no performance reason for using LDS.


          On a side note, there are known issues with performance while using LDS and these are being worked on which if we can fix them should bring a 4x speed improvement.