25 Replies Latest reply on Apr 11, 2016 12:20 PM by jstefanop

    Remove 8 GPU limit on OpenCL driver

    jstefanop

      Will AMD ever remove the 8 GPU limit imposed by their catalyst linux drivers? We want to build a compute platform and have the hardware to implement up to 32 GPUs per system, but AMDs drivers have a hard limit of 8, and anything more is not recognized. Our only option now is to go with Nvidia, since their drivers don't have such limit. This is discouraging as AMDs compute units are way better than AMDs.

        • Re: Remove 8 GPU limit on OpenCL driver
          boxerab

          What sort of motherboard would support more than 8 GPUs ?

          • Re: Remove 8 GPU limit on OpenCL driver
            gstoner

            With the Catalyst 15.201 driver with a FirePro S9150 GPU running Ubuntu 14.04 Linux,  I have test system with up to 16 GPU, this was with a ASUS workstation/server motherboard ASUS Z10PE-D16 WS. This work was in conjuction with Cirascale.    We recenctly started testing SuperMicro SYS-4028 with 16 ASIC, right now the issue is the Motherboard system bios does not support more the 13 GPU's.  I am working with Supermicro to address this issue.   We are also testing system with the new ROCm Driver ( Boltzmann Initiative ) supporting larger number of ASIC with OneStopSystem 3U PCIe breakbox  but we running into mother board system BIOS issues with server we are workig with, which we are working with vendor to address.  

             

            The system your showing uses Xeon E7 v3 class CPU's  ( Xeon E7-4870 2.40 GHz Processor are about $3200 each) in four way configuration.    I have not tested this system to see if they have BIOS resource for more then 13 GPU yet.  They server has 160 lanes of PCI combined which is only posible to support   10 x16 lanes needed for  direct attached GPU's.   Note you willl want to reserve some of these lanes for NVMe drive support when you get to this class of system.

             

            Best Regards

              • Re: Remove 8 GPU limit on OpenCL driver
                jstefanop

                Looks like that mobo is dual socket. Were you running two instances of ubuntu on each CPU, and each instance controlling 8 GPUs? If thats the case then that still does not solve the 8 GPU limit per OS system in the catalyst drivers. If you had 16 running under a single instance then thats a different story.

                  • Re: Remove 8 GPU limit on OpenCL driver
                    gstoner

                    You need to run the driver in Headless mode,  if you try to run X11/OpenGL it when you run into the 8 GPU limitation.  Also S9150 is Passive GPU Server card that is optimized for headless.

                    It does not matter if you have the all the cards on single CPU with 16 GPU’s ( You will need multilevel PLX switches)  or two CPU with 8 GPU’s.   On 2P system the critical issue is the system bios need have enough PCIe resources for Doorbell BAR, IO BAR, MMIO BAR and Expansion ROM.    One thing beyond 16 GPU’s you heading into new world for both AMD and NVIDIA.   We just getting out of the era 8 GPU in single server.

                     

                    Also clean this up will allow you to support peer to peer.     One thing I will caution you is not to do two deep on the number of PLX switches,  you 120 to 140 ns of latency per PCIe switch layer.

                     

                    The number bellow are for single GPU,  what we find is System vendor never optimized there system BIOS for more the 13 GPU’s

                     

                    Here’s the typical AMD GPU PCIe BAR ranges:

                     

                     

                    11:00.0 Display controller: Advanced Micro Devices, Inc.  Fiji  (rev c1)

                            Subsystem: Advanced Micro Devices, Inc.  Device 0b35

                            Flags: bus master, fast devsel, latency 0, IRQ 119

                            Memory at bf40000000 (64-bit, prefetchable)

                            Memory at bf50000000 (64-bit, prefetchable)

                            I/O ports at 3000

                            Memory at c7400000 (32-bit, non-prefetchable)

                            Expansion ROM at c7440000 

                     

                    Legend:

                      • Re: Remove 8 GPU limit on OpenCL driver
                        jstefanop

                        How would you run the driver in headless mode? I think the issue we ran into is that aticonfig requires X11 running to work (we can get around this by setting frequencies etc directly on the card BIOS instead of using aticonfi...we would just loose temperature monitoring), but if we can get up to 12 GPUs per system that would be great.

                          • Re: Remove 8 GPU limit on OpenCL driver
                            gstoner

                            You need this document.   AMD CatalystTM Graphics Driver Installer Notes for Linux® Operating Systems  http://www2.ati.com/relnotes/amd-catalyst-graphics-driver-installer-notes-for-linux-operating-systems.pdf

                            Go to section 5.

                             

                             

                             

                            With  graphics enable i.e.  OpenGL or Windows DX10/11  you hit the 8 GPU limit

                              • Re: Remove 8 GPU limit on OpenCL driver
                                jstefanop

                                looks like that would only apply to specific server class GPU DIDs. The catalyst  drivers would probably not recognize consumer grade GPUs (specifically R9 280x), and run them in headless mode....unless there is a way to force catalyst drivers to recognize consumer DIDs and run them in headless as well?

                                  • Re: Remove 8 GPU limit on OpenCL driver
                                    gstoner

                                    Direct GUI Installation via “NoAMDXorg” Parameter (AMD Supported Distros)

                                     

                                    In cases when your GPU does not appear in the AMD Server GPU or headless GPU list above but you wish to run your system for computational uses only, you can pass following command line parameter to the installer as explained below. Doing so tells the installer not to copy AMD OpenGLTM libraries (X server running on non- AMD OpenGLTM) and use the system only for computational purposes.

                                     

                                    - Download the AMD CatalystTM build to your system.

                                     

                                    - Unzip the AMD CatalystTM build

                                     

                                    - $sudo unzip *.zip

                                     

                                    - Make the binary as an executable:

                                     

                                    - $chmod +x amd-driver-installer-x86.x86_64.run

                                     

                                    - Launch the installer:

                                     

                                    - sudo sh ./amd-driver-installer-x86.x86_64.run --NoAMDXorg

                                     

                                     

                                    •   - Follow the steps described in the Section 2.2 (Automatic Driver Installation via GUI) in this

                                     

                                    document.

                                     

                                     

                                     

                                     

                                    Note: This command line parameter can also be used on headed/generic AMD RadeonTM or AMD FireProTM AMD GPUs. In this case, your system (X server) will be running on the non-AMD OpenGLTM libraries.

                                      • Re: Remove 8 GPU limit on OpenCL driver
                                        jstefanop

                                        hmm missed that part...awesome well give this a shot and see what happens. I wonder if we can still use X11 GUI via Intels iGPU if we set up the catalyst drivers this way?

                                         

                                        Also I'm assuming aticonfig wont even be installed with this setup?

                                          • Re: Remove 8 GPU limit on OpenCL driver
                                            gstoner

                                            Now on the system bios configuration,  you need to turn on above 4 GB adress decoding in your system bios.  Once you do this if you want to use DirectGMA, you need to make sure you have a system bios that allow you to control where you place MMIOH Base = 256G or 512G, and MMIO High Size = 128G or 256G.   The BAR address need to be located between 32bit < BAR < 40bit.    Note if your using i5,i7, Xeon E3 you can only address 39 bits of Virtual Address space,  the issue will be you can turn on above 4GB address on these system but have limited control where the memory is placed.   On Xeon E5 v3 system they bios have more controls where to place your BAR range.   There is second reason we need this to bellow 40 Bit VA range it so we can see our DMA engine on GPU when using DirectGMA.

                                              • Re: Remove 8 GPU limit on OpenCL driver
                                                gstoner

                                                We are also working on a purpose built solution for GPU Compute Platform which support the  headless server market we call the ROCm Platform ( aka Boltzmann Initiative ) . We have this running in house on 16 GPU today.  We have a  PCIe breakout box in house to test 32 GPU, but I ran into server system BIOS issues as I mentioned above.   Any thing above 8-12 GPU you really need to work close with a system integrator or the motherboard vendor to make sure you have System BIOS that works. 

                                                 

                                                ROCm  is a highly optimized driver and runtime which to support HPC and Ultrascale computing needs.  ROCm needs  a  Haswell CPU or newer CPU ( Broadwell, Skylake, so  Core i7, Xeon E3 v5, Xeon E5 v3, Xeon E7 v3,  it has to be a CPU device that support PCIe Gen3 with platform atomics )  and one of our  Fiji based GPU products: R9 Fury Nano, R9 Fury X, R9 Fury.  We are currently testing Ubuntu 14.04 or Fedora 23 since we need Linux kernel verion 4.1 or newer.  Once we hit version one we move to other Linux Distros. 

                                                 

                                                You can find more information at www.gpuopen.com and at Github.com at the following links

                                                https://github.com/RadeonOpenCompute/ROCR-Runtime/tree/dev

                                                https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/tree/dev

                                                 

                                                ROCm  supports the following capabilities currently.    Before you ask we are looking at OpenCL to run on this stack as well.

                                            • Re: Remove 8 GPU limit on OpenCL driver
                                              jstefanop

                                              unfortunately the NoAMDXorg option does not work, fglrx still takes over xorg and writes its own .conf file, and also breaks non AMD GPU. Pretty sure the option is still installing the OpenGL drivers, or its simply not working (at least not with 15.10...maybe ill try with 14 if I get around to it).

                                               

                                              I did try producing my own packages with the -NoXServer option, and that DID work in terms of not breaking the current intel GPU X server, but both clinfo command and the compute openCL program crashed...so I'm assuming the openCL drivers were not properly installed by building the custom package.

                                               

                                              If you know how to install the headless drivers WITH the openCL drivers properly please let me know.

                                                • Re: Remove 8 GPU limit on OpenCL driver
                                                  gstoner

                                                  Try running clinfo as sudo.

                                                   

                                                  Greg

                                                    • Re: Remove 8 GPU limit on OpenCL driver
                                                      jstefanop

                                                      Looks like it was an issue with 15.10...did a fresh install of 14.04 server and got it running openCL headless with the core driver. Issue I am running into now is that 7 GPUs are listed fine with clinfo, but as soon as I add the 8th one clinfo crashes. Pretty sure its not a BIOS MMIO issue since lspci returns all 8 GPUs correctly (I'm assuming if the 8th GPU was above the address space then it would not be listed by lspci ?).

                                                       

                                                      Either way here is the kernal output of the error

                                                       

                                                      [ 170.005741] <3>[fglrx:firegl_pplib_update_display_for_ocl] *ERROR* PPlib was - Pastebin.com

                                                        • Re: Remove 8 GPU limit on OpenCL driver
                                                          gstoner

                                                          If your running with head there is 8 GPU limit with OpenGL.  The PCIe Resource issue in BIOS happens around 12 GPU’s  on today systems.

                                                           

                                                          On thing turn on above 4GB address and see what happens

                                                           

                                                          Greg

                                                            • Re: Remove 8 GPU limit on OpenCL driver
                                                              jstefanop

                                                              Yea the BIOS recognizes all 8 fine, its its most likely an issue with the driver/clinfo enumerating all 8 devices properly. Its running completely headless with zero x server. Could you pass that kernel output to the AMD engineers and see if its something on the driver or hardware end?

                                                                • Re: Remove 8 GPU limit on OpenCL driver
                                                                  gstoner

                                                                  I need a little more info more

                                                                  What processor are you using, what motherboard, which Catalyst driver version, What GPU Hardware are you using? I see your using Ubuntu 14.04, it would be help which version your using ( i.e. Did you update or upgrade it)

                                                                   

                                                                  Greg

                                                                    • Re: Remove 8 GPU limit on OpenCL driver
                                                                      jstefanop

                                                                      Ubuntu 14.04.4 Server, fresh install

                                                                      15.12 Catalyst fxlrx-core non-x support version from Desktop

                                                                      [    3.770497] <6>[fglrx] module loaded - fglrx 15.30.3 [Dec 17 2015] with 7 minors

                                                                      03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Amethyst XT [Radeon R9 M295X Mac Edition] [1002:6938] (rev f1)

                                                                        Subsystem: XFX Pine Group Inc. Device [1682:9385]

                                                                        Kernel driver in use: fglrx_pci

                                                                       

                                                                      Intel Pentium G3258

                                                                      ASROCK H97 Anniversary  

                                                                      XFX R9 380x

                                                                        • Re: Remove 8 GPU limit on OpenCL driver
                                                                          gstoner

                                                                          What motherboard are you using, and are you using a breakout box.

                                                                           

                                                                          greg

                                                                            • Re: Remove 8 GPU limit on OpenCL driver
                                                                              jstefanop

                                                                              ASROCK H97 Anniversary  with 6 PCI slots, we are using a PLX PCI 4 lane bridge to connect the all 8 GPUs

                                                                              • Re: Remove 8 GPU limit on OpenCL driver
                                                                                jstefanop

                                                                                So finally got the system to run properly with headless and no xserver on ubuntu 14.04, but have no run into another problem. The system with 280x GPUs runs fine with 8 GPUs, but we are currently using 380x GPUs and none of them will post as soon at the 7th GPU is connected (works fine for up to 6 GPUs).

                                                                                 

                                                                                This is interesting as both cards are not that different from each other, and I don't see why the 380x's would be running out of address space sooner than the 280x's (if that is what is going on).

                                                                                 

                                                                                The exact same behavior is happening across multiple motherboards/chipsets (Tested H81, H87, and Z97).

                                                                                 

                                                                                Below are snapshots of the kernel addressing the 280x, vs 380x and they seem similar. So I'm really curious for the route cause of this issue and what is different between the BIOS loading on the two cards to be causing this.

                                                                                 

                                                                                280x

                                                                                [    0.210403] pci 0000:05:01.0: PCI bridge to [bus 06]

                                                                                [    0.210405] pci 0000:05:01.0:   bridge window [io  0xb000-0xbfff]

                                                                                [    0.210410] pci 0000:05:01.0:   bridge window [mem 0xf7b00000-0xf7bfffff]

                                                                                [    0.210413] pci 0000:05:01.0:   bridge window [mem 0xc0000000-0xcfffffff 64bit pref]

                                                                                 

                                                                                380x

                                                                                [    0.189688] pci 0000:02:03.0: PCI bridge to [bus 05]

                                                                                [    0.189689] pci 0000:02:03.0:   bridge window [io  0x9000-0x9fff]

                                                                                [    0.189693] pci 0000:02:03.0:   bridge window [mem 0xf7800000-0xf78fffff]

                                                                                [    0.189696] pci 0000:02:03.0:   bridge window [mem 0x40000000-0x501fffff 64bit pref]

                                                                  • Re: Remove 8 GPU limit on OpenCL driver
                                                                    nou

                                                                    Check if file /etc/ati/amdpcbsdefault is installed. I got crashes with clinfo if it was not present. If  it is not then extract it from normaly installed driver.

                                                    • Re: Remove 8 GPU limit on OpenCL driver
                                                      gstoner

                                                      If you want to build a system up to 32 GPU, I will only recommend the new ROCm stack for this class of system which we test at his Class of system.      The latest system we built with Catalyst with FirePro S9150 was  16 GPU.   Current BIOS we see even from major server vendor capped out  to 12 GPU in the BIOS due PCIe Resources,  we have to work closely with Mother board vendor and system integrator to even enable 16 GPU.   Cirascale is who we worked with to enable 16 S9150 GPU’s.    Here is Topology for it.   If you want to have call about send Private email we can talk about your application.

                                                       

                                                       

                                                       

                                                       

                                                       

                                                      Here’s the typical AMD GPU PCIe BAR ranges note we need to make sure the System BIOS has support for 32 card where they fail is MMIO BAR and Expansion ROM the system run out PCIe Resource

                                                       

                                                       

                                                      11:00.0 Display controller: Advanced Micro Devices, Inc.  Fiji  (rev c1)

                                                              Subsystem: Advanced Micro Devices, Inc.  Device 0b35

                                                              Flags: bus master, fast devsel, latency 0, IRQ 119

                                                              Memory at bf40000000 (64-bit, prefetchable)

                                                              Memory at bf50000000 (64-bit, prefetchable)

                                                              I/O ports at 3000

                                                              Memory at c7400000 (32-bit, non-prefetchable)

                                                              Expansion ROM at c7440000 

                                                       

                                                      Legend: