Now that the embargo has been lifted on the Radeon 7970s, I'm curious about some of the features in GCN.
Firstly, GCN is supposed to have a unified address space with the host CPU, 64-bit addressing, and virtual memory. Are these present in the 7970?
With a unified address space, I no longer see a reason for the current 128MB per allocation limit on buffers. This will be glorious if we no longer have to pack larger data sets into textures, which is annoying to debug and write implementation independent code.
Futhermore, it would seem that pointers can now persist across kernels calls, removing the need for treating everything as an index.
Secondly, GCN is supposed to have a fully coherent L1/L2 cache heirarchy. Is this present? No longer having to use textures to get caching (save declaring a buffer as const restrict) would be pretty nice. Also, this would seem to significantly reduce the impact of non-coalesced reads and writes (supposing you can at least efficiently use cache).
Finally, I hear that virtual memory won't appear until 2014 in discreet GPUs. Is this a hardware or driver issue? E.g. does the 7970 support this and will in 2014 or do we have to wait 2 hardware generations?
I would also like to add to the questions: is there HW accel encoding (accessible by built-in kernels) present on HD7970?
Edit: And one more, I just read in a hungarian article about GCN, that double precision will be software degraded, so only the new generation FirePro will show what GCN is really capable of in DP. Please, tell me that this is some misunderstanding. NV has made the really disgusting habit of degrading GeForce DP over Tesla product line only in order to push buyers toward their more expensive products. Pre-7000 Radeons posessed all the DP capacity there was available. Please tell me that this will not change. (Taking into account that there are no dual-GPU FirePros, roughly this would shoot GCN in the leg about making it's way into HPC segment)
Edit2: Ok, I found some information about VCE engine in an article, so that answers first question. But let me make another one than: will Partially Resident Textures be available in OpenCL, because as far as I understood megatexture streaming will be available as an OpenGL and DX11.1 extension. The reason I ask is because it occurs to me that I would like to visualize a an extremely large surface growth simulation on-the-fly via OpenGL interop. Problem is, that system is bitcoded due to it's size, so only way to visualize it is via Geometry Shader that enables bitcoded data to be read and is able to spawn quads based upon it, and an auxiliary vertex array that holds explicit x,y,z values at somearbitrary points. Thus render becomes this auxiliary vertex array, and actual data as an attribute array.
This doubling of data on display device is not a good thing, and I am curious if I could use megatexture streaming from host to make use of the heaps of RAM inside the host. If some efficient streaming method would be available, I could use all GPUs inside the node as if VRAM limit would be the same as host (192GB).
Some of your questions are already answered in the last AMD APP SDK programming guide (1.3g) that was released today. It has a nice new GCN section.
To my unpleasant surprise it did not document any of the new 2.6 features.
what i read review they stated that DP rate is at 1/4 for new radeon 7970. also in some GCN review i read that DP rate should be tunable in HW from 1/16 up to 1/2. and all GCN GPU will support DP.
Could someone explain to me the idea behind HW scaling of DP power? The transistors are there (so chip size is increased which is already payed for by the customer) and than they have it artificially reduced??
There are heaps of forums and blogs talking about the retarded and deprecated notion of DP being an HPC neccessity and that regular consumer products would not utilize it, if it were present on the HW.
AMD was a lot sympathic for not following NV's practice of artificially holding GPU DP performance at bay. If they also start going down this road, that will be an enormous shame. If they really want to distinguish professional equipment from regular cards, AMD should place 6GB of GDDR5 / GPU, or 3GB of ECC GDDR5 / GPU instead of the regular 3GB on HD7970. Perhaps a strictly front-to-back cooling or passive cooling for blade servers... there are many possibilities of making distinction among HW classes beside the extremely lazy and profit-oriented act of implementing HW limitations that have no apparent reason.
It would be really nice if some official information would be given on the topic.
Originally posted by: moozoo
So close, I was hoping for 1 TFLOP DP , Perhaps third party boards and overclocking will get there.
FirePro with 1/2 DP switch enabled should get around 1.5-1.7 TFLOP ( I'm assuming it will be a little bit downclocked )