Archives Discussions

rotor · ‎05-11-2010

58xx vs. 48xx

Hi folks,

So I am planning to buy an ATI 5870 for my work on OpenCL and I may need your help, collaborative minds. Strangely enough I could not find any official tech detail specification of this card from ATI/AMD. I am considering specs like how much constant cache, L2 cache, Shared memory, private memory it have per core. Is that any improvement compared to 48xx series interms of resource per core or just similar? Can you direct me to a source of architecture detail specs documents for this card as well as other cards? In generally where does ATI put them?

Additionally, I read some reviews and they said that if you play with a vector size of 4 you cannot have full 32 bits operation. E.x. Toms Hardware said: "Now the four cores are capable of performing a multiplication or addition per cycle, but only on 24-bit integers"--> so what is the truth? If Toms is right so how the 5870 handle operation on datatype of int4?

Last but not least, is that much more difference in terms of architecture between 5870 and 4890 card? I am spending a huge amount of $ so just want to be sure I pick a right one

Thank you very much,

Roto

nou · ‎05-11-2010

http://sa09.idav.ucdavis.edu/docs/SA09_AMD_IHV.pdf

you can perform full 32bit int operation but only on T unit. other 4 units can perform only 24int. each unit can do 1 ADD/MUL/MAD SP.

and yes there is huge difference between 4xxx and 5xxx series.

if you chose between then then definitely chose 5xxx card.

MicahVillmow · ‎05-11-2010

Rotor,
The 5XXX architecture is derived, but highly improved, from the 4XXX architecture, so there are major similarities. The major difference when it comes to compute performance, outside of more SIMD's, is a high performance local data share per SIMD. There are also a few new instructions and a more flexible IO system allowing byte addressable stores to be supported.

rotor · ‎05-12-2010

Thank Nou and Micah. The slide @ sigraph help me a lot. Still have question that why ATI does not have this kine of information publicly official .

@Micah: I have known that 5xx has lots of improvement compared to 4xx. However I still consider the problem of cache/local memory allocating for the local and private variables. Especially the issue that local array is pilled to global memory, has it solved in 2.1 SDK and 5xx? Is still there any memory emulating here?

Thanks,

Rt

MicahVillmow · ‎05-12-2010

rotor,
2.1, and I thought 2.0/2.01, did not push local memory into global on 5XXX cards. In 2.1, private memory is now represented by scratch buffers and not global emulated.

Raistmer · ‎05-12-2010

where scratch buffers localized on 5xxx and on 4xxx GPUs? Not in global memory? In shared memory (that is, register spilling will cut from 16k of shared memory on 5xxx GPUs? )
And what about shared memory on 4870 ? It not so versatile as new one on 5xxx, but maybe it could be somehow exposed into OpenCL too? Or good piece of fast memory just lost completely for the future on 4xxx GPUs?

MicahVillmow · ‎05-12-2010

There is currently no plan to expose 4XXX hardware local memory in OpenCL.
Scratch buffers are stored in global memory, but unlike the emulated global memory, they can be optimized away by the CAL compiler in some cases and use register indexing in others.

rotor · ‎05-18-2010

Hi Micah,

It's strange to me that private memory is now represented by the scratch buffer which is stored in global memory. Event it is "not emulated" and ATI have very special strategy to manage this chunk of scratch buffer on global memory(a.k.a VRAM), it still much slower than on-chip memory. So what the reason here? from perspective of programming we expect that private memory or scratch buffer should be fast but it is not allocated in on-chip memory! Why ATI now has physical on-chip local memory (according to you) but still put the private memory on global memory?

I have gone through 5xx architecture slides of ATI @ Siggraph (http://sa09.idav.ucdavis.edu/docs/SA09_AMD_IHV.pdf) that nou recommended and I came up with a question that: from slide 16 to 20 ATI talks about two different type of shared memory that Local Shared and Global Shared memory. Why do you guys need two type of the shared memory and on the execution how and where the SDK 2.1 locate the shared memory? and is that the physical "shared memory" here is relevant to the logical "local memory" of openCL spec?

And in summary could you please specify where the following OpenCL logical memories are PHYSICALLY located in 5xxx series?

-Local memory

-Constant memory

-Private memory

Many thanks,

Roto

Archives Discussions

Need help on ATI 5870 Specs?