25 Replies Latest reply on Nov 13, 2012 12:08 AM by hsaigol

    OpenCL with quad watercooled Radeon 7990s

    Starglider

      The cards finally arrived:

       

      dual-tahiti-cards.jpg

       

      These are the 'workstation edition' of the Powercolor 7990 Devil13; to the best of my knowledge we are the only customer to have these. Currently testing them with the original air coolers prior to fitting the custom nickel-copper waterblocks and integrating them all into the same machine;

       

      dual-tahiti-testing.jpg

       

      Host machine has dual overclocked 8-core Xeons, 128 GB DDR3, quad SSDs, 10gig ethernet, dual PSUs, quad triple-fan radiators etc. I can happily confirm that the 'impossible to disable crossfire on dual-GPU cards' bug that made the 5970 and 6990 unusable for GPGPU has been fixed for the 7990; there was lots of flickering and Catalyst Control Center crashed when we disabled CrossfireX, but after opening it up again CrossfireX was finally disabled and so far dual-device tests are working fine. Will be running 8-GPU scaling and overclocking tests next week; if it works well then we will make this spec our standard developer workstation for the next project.

        • Re: OpenCL with quad watercooled Radeon 7990s
          dmeiser

          That looks really sweet. Looking forward to your scaling and overclocking results.

           

          What types of applications are you building these machines for? Do you sell them?

           

          Cheers,

          Dominic

            • Re: OpenCL with quad watercooled Radeon 7990s
              Starglider

              The project is a production deployment of a C++ to OpenCL auto-parallelising cross-compiler, based on supercompilation techniques. These workstations are used to run the supercompiler itself (which is extremely compute intensive) plus testing of the application code, which consists of many different workflows and codebases currently running on several thousand 12-core Xeon blades. The production severs are currently planned as standard 8 x (next year's) Firestream 4U rackmount boxes, but we do have a team looking at the possibility of liquid cooling for those as well.

               

              I don't know if they'll be available as stand-alone machines - currently they come as part of a GPGPU consulting / software license / dev hardware package - but certainly I'll let you know if they go on sale anywhere. Certainly the series production version will be a bit tidier inside than the engineering prototype in the photo!

            • Re: OpenCL with quad watercooled Radeon 7990s
              Starglider

              Build in progress. Will clearly have to truncate the brackets with the bandsaw in order to get four cards onto the motherboard (the 10GBASE-T NIC will have to go on a flexible PCIe riser, mounted elsewhere in the case);

               

              http://i36.photobucket.com/albums/e6/stargliderx/4x7990.jpg

               

              http://i36.photobucket.com/albums/e6/stargliderx/7990stripped.jpg

               

              Had to remove the backplates to physically fit the top card into the slot; the clearance on the ASUS Z9PE-D8 WS is very tight. Will mount individual passive heatsinks on the rear memory chips. Seems a shame to throw away all this carefully engineered copper and aluminium, but alas Powercolor didn't want to ship bare cards;

               

              http://i36.photobucket.com/albums/e6/stargliderx/7990coolers.jpg

              • Re: OpenCL with quad watercooled Radeon 7990s
                pwvdendr

                Out of curiosity, how are you watercooling 1500+ watts? I'm running a similar machine with 8x HD7970 on an 8-slot motherboard, but I had to go for extender cables and air cooling, because I failed to find reliable water cooling without making the water boil under full load.

                  • Re: OpenCL with quad watercooled Radeon 7990s
                    Starglider

                    Quad radiators each with triple 120mm fans;

                     

                    http://i36.photobucket.com/albums/e6/stargliderx/7990s-watercooled.jpg

                    http://i36.photobucket.com/albums/e6/stargliderx/7990s-installed.jpg

                     

                    This is the loop configuration;

                     

                    http://i36.photobucket.com/albums/e6/stargliderx/loop.png

                     

                    Temperatures and noise are both quite low so far. 8 x 7970 is the expected production config, /if/ we can demonstrate good enough reliability with the AMD Drivers (looking dubious at the moment), the 4 x 7990 is specifically to fit into a desktop machine.

                      • Re: OpenCL with quad watercooled Radeon 7990s
                        dmeiser

                        Nice progress. Do you mind sharing what kind of problems you have encountered with the drivers? What version of the drivers and AMD APP SDK are you planning to use? What OS?

                          • Re: OpenCL with quad watercooled Radeon 7990s
                            pwvdendr

                            I guess the same problems that I experienced. Booting works fine, but the AMD drivers demonstrate problems. Under Windows they will either not recognize all the cards or BSOD'ing with an error in atikmdag.sys. Under Linux they can be made to work: I wrote a small tutorial here how to make it work under Ubuntu. http://devgurus.amd.com/thread/159019#1280510

                              • Re: OpenCL with quad watercooled Radeon 7990s
                                Starglider

                                The AMD Windows OpenCL driver seems to be completely useless; it is unstable with 3 GPUs and always fails with 4 GPUs. The problem is always the process crashing on call to getPlatformIDs i.e. OpenCL initialisation. With 3 GPUs there is no obvious casual factor; some benchmarks always crash on startup, some crash on startup only 50% of the time, others run fine. 4 GPUs is hopeless; on Windows 7 Professional with 4 7970s the display driver doesn't crash, but all of the different benchmark processes always crash as soon as they tries to initialise OpenCL (before it even gets to enumerating devices, compiling kernels or creating buffers). The same crash occurs using OpenCL directly, or through Cloo (C#) or Jocl (Java). I'd stress that this crash is not due to attempting to use more than one GPU in a process - even the least challenging case of one process per GPU each using a context with only one GPU in it cannot scale past two GPUs on Windows.

                                 

                                We are having more success with Linux, and will get some 8 GPU scaling numbers from that, but they're fairly academic as Windows support is really mandatory for a large scale adoption. We might get a couple more Linux dev machines installed, but I strongly suspect this will be another failed attempt (by us) to introduce AMD into the enterprise and another >$1M of Firestream sales lost (to Nvidia) because the drivers fundamentally do not work. We're trying Windows Server 2012 with the Windows 8 drivers next week, but I'm not optimistic.

                                  • Re: OpenCL with quad watercooled Radeon 7990s
                                    pwvdendr

                                    With 3 GPUs there is no obvious casual factor; some benchmarks always crash on startup, some crash on startup only 50% of the time, others run fine. 4 GPUs is hopeless; on Windows 7 Professional with 4 7970s the display driver doesn't crash, but all of the different benchmark processes always crash as soon as they tries to initialise OpenCL (before it even gets to enumerating devices, compiling kernels or creating buffers).

                                    I presume that you are talking about 3/4 7990s, not 7970s? A 7990 consists of 2 GPUs.

                                    For me, everything worked fine up to 5 7970s, including OpenCL load testing. Above 5 your story sounds familiar.

                                     

                                    I don't know if it helps, but I *did* get it working with 3x6990s + 1xHD5450 (that's 7 GPUs -- didn't try 4x6990 as I only had 3 back then). The AMD Catalyst drivers had exactly the same problem, but the display driver installed by Windows Update, together with just the opencl driver from AMD (no graphics driver or catalyst!) did the trick. Alas, when I tried this, Windows Update had no drivers for the 7000 series, so I could not mimic this trick for my 7970s.

                                     

                                    So if a machine with 4x6990 is of any value to you, that should be possible under windows. Also, if you could manage to take a look on why the windows update drivers for 6990 work and the regular ones don't, that could shed a new light on this, and possibly provide a solution for 4x7990 as well. I'm only doing this as a side project for my thesis, so I didn't dig very deep into this, but it seems you have a lot more budget available for this project.

                                      • Re: OpenCL with quad watercooled Radeon 7990s
                                        Starglider

                                        pwvdendr wrote:

                                         

                                        With 3 GPUs there is no obvious casual factor; some benchmarks always crash on startup, some crash on startup only 50% of the time, others run fine. 4 GPUs is hopeless; on Windows 7 Professional with 4 7970s the display driver doesn't crash, but all of the different benchmark processes always crash as soon as they tries to initialise OpenCL (before it even gets to enumerating devices, compiling kernels or creating buffers).

                                        I presume that you are talking about 3/4 7990s, not 7970s? A 7990 consists of 2 GPUs.

                                        For me, everything worked fine up to 5 7970s, including OpenCL load testing. Above 5 your story sounds familiar.

                                         

                                        No, I am talking about 7970s; we have other workstations with various numbers and makes of 7970 in them. Rigorously enforcing 64-bit compilation only and always disabling 'randomise base address' in the VS linker has gotten 3 7970s to work reliably in Windows 7, for all benchmarks and workflows we are currently testing. However having four or more Tahitis in the same machine is still crashing AMD APP on the first call (usually getPlatformIds) on almost every run (occasionally we manage to get a trivial test case to run on one GPU but nothing more). We have tried current, previous and beta versions of Catalyst on multiple different motherboards and CPUs, no change in outcome. Windows 8 / server 2012 testing is still pending.

                                         

                                        The 7990s seem to work almost like a pair of 7970s i.e. fine in Linux but if you have more than one plugged in in Windows 7 all OpenCL apps always crash. The only noticable difference is that having two 7990s in the machine crashes GPUTweak (loading hangs on splash screen), whereas four 7970s does not. MSI Afterburner still works.

                                         

                                        > I don't know if it helps, but I *did* get it working with 3x6990s + 1xHD5450 (that's 7 GPUs -- didn't try 4x6990

                                        > as I only had 3 back then). The AMD Catalyst drivers had exactly the same problem, but the display driver

                                        > installed by Windows Update, together with just the opencl driver from AMD (no graphics driver or catalyst!)

                                        > did the trick.

                                         

                                        We haven't observed any problems with graphics output; plugging two monitors into two different 7990s works fine, haven't tried quad crossfire but I assume that was rigorously tested by Powercolor as it is an officially supported config. It seems to be just AMD APP that cannot handle more than 3 7970s. Unfortunately I don't have any 6990s around to test right now, but that wouldn't help with the sale anyway as the 69xx series cannot compete with Nvidia Teslas / 580s in these non-vector workloads.

                                         

                                        > I'm only doing this as a side project for my thesis, so I didn't dig very deep into this, but it seems you have a lot

                                        > more budget available for this project.

                                         

                                        We have a fair amount of resource and are prepared to run any tests or build any hardware configs AMD ask for, but if they aren't willing to address the issue there is nothing we can do (other than switch back to Nvidia). This is actually the first time we've been asked to support Windows for large-scale GPGPU grid compute (most people prefer Linux); unfortunately it coincides with trying to use AMD GPUs and thus the nasty surprise of finding the AMD Windows drivers don't work.

                                          • Re: OpenCL with quad watercooled Radeon 7990s
                                            hsaigol

                                            hi starglider
                                            Can you share an executable for windows 7 64bit which i can try to reproduce the failure on. please provide source code as well.
                                            i'm going to setup with 2 7990's so the the simplest app/code that will crash will be the best

                                            • Re: OpenCL with quad watercooled Radeon 7990s
                                              pwvdendr

                                              Rigorously enforcing 64-bit compilation only and always disabling 'randomise base address' in the VS linker has gotten 3 7970s to work reliably in Windows 7, for all benchmarks and workflows we are currently testing. However having four or more Tahitis in the same machine is still crashing AMD APP on the first call (usually getPlatformIds) on almost every run (occasionally we manage to get a trivial test case to run on one GPU but nothing more). We have tried current, previous and beta versions of Catalyst on multiple different motherboards and CPUs, no change in outcome. Windows 8 / server 2012 testing is still pending.

                                              Strange. My first machine has 8x HD7970, MSI Big Bang Marshall B3 motherboard, Core i7-2600K CPU, and when connecting all 8 cards under windows and after installing the drivers back then (don't remember from CD or website -- they were different though: one worked and one didn't), OpenCL could run flawlessly, but only detected 5 out of 8 GPUs (so only ran at 62.5% of full capacity) under Windows 7 (64bit, pro).

                                               

                                              We haven't observed any problems with graphics output; plugging two monitors into two different 7990s works fine, haven't tried quad crossfire but I assume that was rigorously tested by Powercolor as it is an officially supported config. It seems to be just AMD APP that cannot handle more than 3 7970s.

                                              Is this still true if you have 4x HD7990 and plug in a monitor in each? Perhaps the fact that the HD7990 has an internal crossfire will have avoided the problems that I encountered. For me, 3 out of 8 devices were simply not recognized by Windows, even after installing AMD's graphics drivers. Which of course voided any chance to get them running OpenCL code. But other than that, I never saw any "additional" trouble caused by the OpenCL drivers / AMD APP SDK.

                                          • Re: OpenCL with quad watercooled Radeon 7990s
                                            captian-n

                                            Nice work. I am looking forward to your results. I have the same problems like you. See here: http://devgurus.amd.com/message/1283720#1283720 and here http://devgurus.amd.com/thread/159653.

                                            So I´m also really interested if you find a solution. May be if more and more people report the same issues AMD do something. For me the best version is the Catalyst 12.6 but only supports up to 5 GPU and clGetPlatformID problem arises sometimes. But every following driver is more worse for multiGPU. I would try to install without Catalyst like that what pwvdendr says, may be that is a solution.

                                      • Re: OpenCL with quad watercooled Radeon 7990s
                                        pwvdendr

                                        Using 4 radiators, wow. Yes, that can definitely keep it cool, even with overclocking beyond the 1125MHz that air cooling allows, and probably without sounding like a vacuum cleaner like mine does.

                                         

                                        Out of curiosity, how much did the entire watercooling system cost? Looking up some watercooling parts suggests figures beyond the cost of the cards.

                                        • Re: OpenCL with quad watercooled Radeon 7990s
                                          Skysnake

                                          Very nice

                                           

                                          I hope AMD see, that more and more people want to do something like this and start to do something.

                                           

                                          btw.:

                                           

                                          W10000 seems to be a Dual Tahiti FirePro card

                                      • Re: OpenCL with quad watercooled Radeon 7990s
                                        dmeiser

                                        Btw have you seen the work by these guys:

                                        http://devgurus.amd.com/thread/159457

                                        They seem to have four 6990s and they claim they can run under windows 7 and ubuntu (doesn't say which version). They're using the AMD catalyst 12.3 driver.