3 Replies Latest reply on Oct 9, 2011 5:26 AM by tanq

    Detailed data on new HD products

      Need help for upgrading

      My HD4850 is quite obsolete now, I lookig for adequate upgrade for that. Main use of that vide is OpenCL, not games. So, I need some details on OpenCL performance of recent chips, including HD5XXX, 6XXX and 7XXX.

      Is there some comparision table?

      What data I looking for:

      1) number of compute units

      2) total teraflops

      3) size of constant memory and LDS (I mean true LDS)

      4) LDS bandwidth, constant bandwidth, global memory bandwidth

      5) PCIe bandwidth and latency, is DMA working at all? It still broken on my radeon

      7) latency of kernel launching - any improvements sine HD4850?

      8) how fast global atomics are?

      9) new features of HD7XXX vs HD6XXX


        • Detailed data on new HD products

          Most of the information on 5xxx and 6xxx series cards is out there - there's alot of models in each range, so I suggest checking them through yourself. Info like compute units you should be able to figure yourself. You can find them on AMD's product page. At the end of the day it's how much your willing to spend.

          If you want to get some good idea about relative performances for each product, pop your OpenCL code into AMD's KernelAnalyzer and see what perf. figures it spits out for each card - this will give you the most realistic performance figures you can get without actually running your code on different devices.

          In a nutshell - The top of the range single GPU products in each of the 5xxx series (5870) and 6xxx series (6970) are very similar. They have similar theoretical flops, but the 6970 has more compute units (24vs20) so your going to get better real world performance for your 'typical' code, particularly where branches diverge more frequently. The 6970 has slightly more bandwidth (176Gb/s vs 153Gb/s). In the 6970 the T-unit has been done away with and its functions merged with other stream processors for massive improvement on specific operations like floating point to integer conversion.

          The 7xxx isn't out, so no-one here knows how it'll compare.

          With regards all your other questions (3-9), things are very similar between the 5xxx and 6xxx, all of which has improved significantly over the 4xxx. Both support local atomics and feature hardware local shared memory (32kb for 5870 & 64kb for 6970). Global atomics are horrendously slow on any card as in most cases you have to serialise all memory access. I don't know if asynchronous DMA has been implemented yet. The 5870 features 2560bytes/clk LDS bandwidth, I'm not sure about the 6970, but I assume it will be similar. I don't think anyone's compared kernel latencies for different series, but your code should be written in a way so that kernel latency is low compared to execution time, whatever your programming for so I wouldn't concern yourself with this. Real world PCIe bandwidth is around the 4Gb/s mark, whatever your card.

          As I mentioned, aside from actually testing your code, try AMD's KernelAnalyzer for a rough performance comparison.


          • Detailed data on new HD products

            Try doing searches and look at magazine articles about the hardware details if you want some pretty diagrams.  Some of the better reviews also cover potentially important details like power and heat.

            Other than that, I could say read the manual ...

            Pretty much all the important stuff is in the AMD APP OpenCL Programming Guide.

            Appendix D has tables of most of the numbers you're asking for for each device up to the HD 6970, and chapter 4 has lots of other details about memory.

            PS I went looking for an update recently and the documentation index seems somewhat ... muddled.  You want: http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf