AMD hardware, OpenCL and CUDA

Discussion created by houyunqing on May 25, 2011
Latest reply on Jun 1, 2011 by houyunqing
Questions from a CUDA developer


I'm starting to look at AMD's hardware and I'm surprised by the GFLOP numbers (for the 6970: 384*4*2*880/1024=2640GFLOPS). Shouldn't the AMD cards be significantly faster than NVIDIA cards for arithmetic-intensive kernels, since GTX 580 only has 1544 GFLOPs(using the same computation method above)?

Why is it that all the fastest super-computers in China, Japan and the States are all using NVIDIA's Teslas? Those Tesla cards are many many times more expensive than their AMD counterparts! Does it have anything to do with floating point IEEE-compliance?

Apart from that, I have a few other questions. Thanks for any help in advance.

1. NVIDIA's native ISA doesn't have a proper name (I call it Fermi ISA for the current generation) because NVIDIA does not disclose much information about it. (They only provide a high-level assembly-like language, PTX, for CUDA developers to work with. I think it's just like the AMD IL) Does AMD provide developers with comprehensive information about the their native ISA?

Also, NVIDIA provides developers with a cuobjdump, which disassembles cubins. This program is the only source of information about the their native ISA. Does AMD provide developers with this kind of similar disassembler, and perhaps an assembler as well?

2. What is the length of a native VLIW? Are the stream cores capable of sustained 5/4-scalar operations per clock? I'm thinking if th VLIWs are long enough it might be very demanding for the instruction cache.

3. How many immediate values can a native VLIW contain? In reality, do the stream cores often issue 4/5 instructions in parallel since many instructions may contain 8-bit, 16-bit or perhaps 32-bit immediate values?

4. How much information does AMD provide regarding their hardware? Like caching behaviour(replacement policy, cacheline size, associativity..), ld/st latency, arithmetic latency, memory channel width and so on?

5. In general, I would also like to know the various differences between CUDA C and OpenCL C. I am aware of the basic OpenCL terminology.

Again, thanks for your time and help!