cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

rick_weber
Adept II

Radeon 7970 questions

Now that the embargo has been lifted on the Radeon 7970s, I'm curious about some of the features in GCN.

Firstly, GCN is supposed to have a unified address space with the host CPU, 64-bit addressing, and virtual memory. Are these present in the 7970?

With a unified address space, I no longer see a reason for the current 128MB per allocation limit on buffers. This will be glorious if we no longer have to pack larger data sets into textures, which is annoying to debug and write implementation independent code.

Futhermore, it would seem that pointers can now persist across kernels calls, removing the need for treating everything as an index.

Secondly, GCN is supposed to have a fully coherent L1/L2 cache heirarchy. Is this present? No longer having to use textures to get caching (save declaring a buffer as const restrict) would be pretty nice. Also, this would seem to significantly reduce the impact of non-coalesced reads and writes (supposing you can at least efficiently use cache).

Finally, I hear that virtual memory won't appear until 2014 in discreet GPUs. Is this a hardware or driver issue? E.g. does the 7970 support this and will in 2014 or do we have to wait 2 hardware generations?

0 Likes
28 Replies
Meteorhead
Challenger

I would also like to add to the questions: is there HW accel encoding (accessible by built-in kernels) present on HD7970?

Edit: And one more, I just read in a hungarian article about GCN, that double precision will be software degraded, so only the new generation FirePro will show what GCN is really capable of in DP. Please, tell me that this is some misunderstanding. NV has made the really disgusting habit of degrading GeForce DP over Tesla product line only in order to push buyers toward their more expensive products. Pre-7000 Radeons posessed all the DP capacity there was available. Please tell me that this will not change. (Taking into account that there are no dual-GPU FirePros, roughly this would shoot GCN in the leg about making it's way into HPC segment)

Edit2: Ok, I found some information about VCE engine in an article, so that answers first question. But let me make another one than: will Partially Resident Textures be available in OpenCL, because as far as I understood megatexture streaming will be available as an OpenGL and DX11.1 extension. The reason I ask is because it occurs to me that I would like to visualize a an extremely large surface growth simulation on-the-fly via OpenGL interop. Problem is, that system is bitcoded due to it's size, so only way to visualize it is via Geometry Shader that enables bitcoded data to be read and is able to spawn quads based upon it, and an auxiliary vertex array that holds explicit x,y,z values at somearbitrary points. Thus render becomes this auxiliary vertex array, and actual data as an attribute array.

This doubling of data on display device is not a good thing, and I am curious if I could use megatexture streaming from host to make use of the heaps of RAM inside the host. If some efficient streaming method would be available, I could use all GPUs inside the node as if VRAM limit would be the same as host (192GB).

0 Likes

Some of your questions are already answered in the last AMD APP SDK programming guide (1.3g) that was released today. It has a nice new GCN section.

To my unpleasant surprise it did not document any of the new 2.6 features.

 

0 Likes

Gat3way, could you upload 1.3g version somewhere? Current 1.3f doc contains nothing about gcn.

Added:

Sorry, I have found it.

0 Likes

what i read review they stated that DP rate is at 1/4 for new radeon 7970. also in some GCN review i read that DP rate should be tunable in HW from 1/16 up to 1/2. and all GCN GPU will support DP.

0 Likes

Could someone explain to me the idea behind HW scaling of DP power? The transistors are there (so chip size is increased which is already payed for by the customer) and than they have it artificially reduced??

There are heaps of forums and blogs talking about the retarded and deprecated notion of DP being an HPC neccessity and that regular consumer products would not utilize it, if it were present on the HW.

AMD was a lot sympathic for not following NV's practice of artificially holding GPU DP performance at bay. If they also start going down this road, that will be an enormous shame. If they really want to distinguish professional equipment from regular cards, AMD should place 6GB of GDDR5 / GPU, or 3GB of ECC GDDR5 / GPU instead of the regular 3GB on HD7970. Perhaps a strictly front-to-back cooling or passive cooling for blade servers... there are many possibilities of making distinction among HW classes beside the extremely lazy and profit-oriented act of implementing HW limitations that have no apparent reason.

It would be really nice if some official information would be given on the topic.

0 Likes

947 GFLOPs Double Precision compute power

http://www.amd.com/us/products/desktop/graphics/7000/7970/Pages/radeon-7970.aspx#3

 

 

 

0 Likes

Originally posted by: sh 947 GFLOPs Double Precision compute power


So close, I was hoping for 1 TFLOP DP , Perhaps third party boards and overclocking will get there.

0 Likes

Originally posted by: moozoo

 

So close, I was hoping for 1 TFLOP DP , Perhaps third party boards and overclocking will get there.

 

FirePro with 1/2 DP switch enabled should get around 1.5-1.7 TFLOP ( I'm assuming it will be a little bit downclocked )

0 Likes

Originally posted by: hazeman

 

FirePro with 1/2 DP switch enabled should get around 1.5-1.7 TFLOP ( I'm assuming it will be a little bit downclocked )

 

As far as I know, "switch"  doesn't exist. 1/2 DP and 128kB L2 cache is reserved for the next gen chips (not Southern Islands).

0 Likes

Gat3way and sh,

 Can you upload the documnet (1.3g) or point to the address where you found it?

0 Likes

Originally posted by: aymankh Gat3way and sh,

 

 Can you upload the documnet (1.3g) or point to the address where you found it?

 

I don't have 1.3g, but 1.3f manual contains gcn section (4.15.6) too.

 

0 Likes

All GCN hardware will support double precision, where-as before only the high end cards will support double precision. However, the lower end cards will support it at 1/8th or 1/16th rate instead of at 1/4th rate.
0 Likes

As I want to buy HD7970, and therefore a question for AMD:
Will the HD7970, support cl_khr_fp64 at once? Or how to HD5800/6900, after 2 years ...

0 Likes

Support for cl_khr_fp64 is coming on all devices that support double precision except for RV7XX based devices.
0 Likes

When? For HD5800 announced in September 2009, support cl_khr_fp64 implemented in July 2011 with SDK2.5, for the HD6900 released a year ago still has no support.
This is how to buy a truck that has already traveled, but the goods can be transported only after 2 years ...
Do you have good hardware, but by the time when it is fully supported, it is morally (2 generation) is obsolete

0 Likes

We don't give out exact release dates, but it should be in one of the upcoming catalyst releases as long as nothing comes up at the last minute.
0 Likes

Speaking of which Micah - is there a way to get the top16 bits of a 24x24 bits multiplication? Had hoped it would be in SDK2.6 yet tried find it - didn't.

 

Is there a way, method or trick to somehow generate that hardware instruction that also opencl 1.2 doesn't seem to support?

 

Will it be in a SDK one day? 

 

 

0 Likes

Ok, and where is the Software Developer Guide for GCN?

I really need this guide for my work...

I will get in a few weeks a 7970 and want to write some versions of nbody with elastic hit and a heatplade simulation. Perhaps also a liquid simulation. All without the use of BLAS and something like this, to see how far you can go with typical code. Of course with shared mem, structs of arrays instead of array of structs and all this things. When i have time, perhaps i will also use in one implementation BLAS etc.

I have talked with some guys in my lectures. They  perhaps will use this programs for there next lectures, because i will use OpenGL for visualisation. But for this i REALLY need the developer guide and as much informations i can get...

0 Likes

Thanks, for your hint.

Not very much, but better then nothing.

Is there something else? It is really little information about GCN.

0 Likes

Do you have specific questions?  The documentation will take a while to be updated.

Jeff

0 Likes

Yeah, i have some questions.

What is with the GDS buffer? What kind of job have it, and how can i use it?

Also is the SP:DP Ratio now 1:2 or 1:4 for the HD7k Series and much more important, it is limited, or unlimited. So will be the SP:DP different for FirePro and Radeon? For Example Radeon 1:4 FirePro 1:2

One more very important part is the x86 adress space. How can i use it with the GCN architecture?

On the FDS you have sayed that we will have this a unified adress space.

0 Likes

SP:DP ratio is 1:4 for 79xx cards and 1:16 for 77xx and 78xx cards.

0 Likes

Hardware or Driver limitation?

I already know that the HD7970 have only 1:4, but there is no clear answer, if this is the hardware limitation or a "driver" limitation like for the GeForce GTX580 where you have 1:8 and ond the same chip, but on a Tesla you have 1:2.

0 Likes

As i understood the presentation on GCN at last years Fusion this is a chip design time decision. This would also match with AMDs past policy of building different chips for different use-cases which was always reflected in the actually different code names (opposed to NVIDIAs single code name per generation).

0 Likes

Micah,

finally I've got the cat 12.1 for linux with support of 7970. But it looks like there is no more global buffer (aka g[ ]) access in IL  kernels for 7970. Is it true? Why?

0 Likes

I don't know if new cards dont have g[], but as a workaround you can now attach 2d image to the uav ( raw or struct - i've noticed it working since driver 11.12  ) - you just need to add CAL_RESALLOC_GLOBAL_BUFFER flag ( the same as with g[] ).

I know it's not the same as having g[] ( compatibility with older cards, computing pixel shader kernels, ... ).

0 Likes

g[] buffer has been deprecated for a few years now. Please use UAV's instead, which are more flexible than g[] buffer(32bit vs 128 bit alignment, 8bit vs 32 bit read/writes). While g[] worked on evergreen and NI hardware, it was suboptimal in many cases and the hardware does not exist for it to work on SI.

0 Likes