cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jstefanop
Adept I

Remove 8 GPU limit on OpenCL driver

Will AMD ever remove the 8 GPU limit imposed by their catalyst linux drivers? We want to build a compute platform and have the hardware to implement up to 32 GPUs per system, but AMDs drivers have a hard limit of 8, and anything more is not recognized. Our only option now is to go with Nvidia, since their drivers don't have such limit. This is discouraging as AMDs compute units are way better than AMDs.

25 Replies
boxerab
Challenger

What sort of motherboard would support more than 8 GPUs ?

0 Likes

Some server motherboards allows more then 7 PCI-E slots. This one of Supermicro has 11 PCI-E slots: Supermicro | Products | SuperServers | 2U | 2048U-RTR4

0 Likes
gstoner
Staff

With the Catalyst 15.201 driver with a FirePro S9150 GPU running Ubuntu 14.04 Linux,  I have test system with up to 16 GPU, this was with a ASUS workstation/server motherboard ASUS Z10PE-D16 WS. This work was in conjuction with Cirascale.    We recenctly started testing SuperMicro SYS-4028 with 16 ASIC, right now the issue is the Motherboard system bios does not support more the 13 GPU's.  I am working with Supermicro to address this issue.   We are also testing system with the new ROCm Driver ( Boltzmann Initiative ) supporting larger number of ASIC with OneStopSystem 3U PCIe breakbox  but we running into mother board system BIOS issues with server we are workig with, which we are working with vendor to address.  

The system your showing uses Xeon E7 v3 class CPU's  ( Xeon E7-4870 2.40 GHz Processor are about $3200 each) in four way configuration.    I have not tested this system to see if they have BIOS resource for more then 13 GPU yet.  They server has 160 lanes of PCI combined which is only posible to support   10 x16 lanes needed for  direct attached GPU's.   Note you willl want to reserve some of these lanes for NVMe drive support when you get to this class of system.

Best Regards

0 Likes

Looks like that mobo is dual socket. Were you running two instances of ubuntu on each CPU, and each instance controlling 8 GPUs? If thats the case then that still does not solve the 8 GPU limit per OS system in the catalyst drivers. If you had 16 running under a single instance then thats a different story.

0 Likes

You need to run the driver in Headless mode, if you try to run X11/OpenGL it when you run into the 8 GPU limitation. Also S9150 is Passive GPU Server card that is optimized for headless.

It does not matter if you have the all the cards on single CPU with 16 GPU’s ( You will need multilevel PLX switches) or two CPU with 8 GPU’s. On 2P system the critical issue is the system bios need have enough PCIe resources for Doorbell BAR, IO BAR, MMIO BAR and Expansion ROM. One thing beyond 16 GPU’s you heading into new world for both AMD and NVIDIA. We just getting out of the era 8 GPU in single server.

Also clean this up will allow you to support peer to peer. One thing I will caution you is not to do two deep on the number of PLX switches, you 120 to 140 ns of latency per PCIe switch layer.

The number bellow are for single GPU, what we find is System vendor never optimized there system BIOS for more the 13 GPU’s

Here’s the typical AMD GPU PCIe BAR ranges:

11:00.0 Display controller: Advanced Micro Devices, Inc. Fiji (rev c1)

Subsystem: Advanced Micro Devices, Inc. Device 0b35

Flags: bus master, fast devsel, latency 0, IRQ 119

Memory at bf40000000 (64-bit, prefetchable)

Memory at bf50000000 (64-bit, prefetchable)

I/O ports at 3000

Memory at c7400000 (32-bit, non-prefetchable)

Expansion ROM at c7440000

Legend:

0 Likes

How would you run the driver in headless mode? I think the issue we ran into is that aticonfig requires X11 running to work (we can get around this by setting frequencies etc directly on the card BIOS instead of using aticonfi...we would just loose temperature monitoring), but if we can get up to 12 GPUs per system that would be great.

0 Likes

You need this document. AMD CatalystTM Graphics Driver Installer Notes for Linux® Operating Systems http://www2.ati.com/relnotes/amd-catalyst-graphics-driver-installer-notes-for-linux-operating-systems.pdf

Go to section 5.

With graphics enable i.e. OpenGL or Windows DX10/11 you hit the 8 GPU limit

0 Likes

looks like that would only apply to specific server class GPU DIDs. The catalyst  drivers would probably not recognize consumer grade GPUs (specifically R9 280x), and run them in headless mode....unless there is a way to force catalyst drivers to recognize consumer DIDs and run them in headless as well?

0 Likes

Direct GUI Installation via “NoAMDXorg” Parameter (AMD Supported Distros)

In cases when your GPU does not appear in the AMD Server GPU or headless GPU list above but you wish to run your system for computational uses only, you can pass following command line parameter to the installer as explained below. Doing so tells the installer not to copy AMD OpenGLTM libraries (X server running on non- AMD OpenGLTM) and use the system only for computational purposes.

- Download the AMD CatalystTM build to your system.

- Unzip the AMD CatalystTM build

- $sudo unzip *.zip

- Make the binary as an executable:

- $chmod +x amd-driver-installer-x86.x86_64.run

- Launch the installer:

- sudo sh ./amd-driver-installer-x86.x86_64.run --NoAMDXorg

  •   - Follow the steps described in the Section 2.2 (Automatic Driver Installation via GUI) in this

document.

Note: This command line parameter can also be used on headed/generic AMD RadeonTM or AMD FireProTM AMD GPUs. In this case, your system (X server) will be running on the non-AMD OpenGLTM libraries.

0 Likes

hmm missed that part...awesome well give this a shot and see what happens. I wonder if we can still use X11 GUI via Intels iGPU if we set up the catalyst drivers this way?

Also I'm assuming aticonfig wont even be installed with this setup?

0 Likes

Now on the system bios configuration,  you need to turn on above 4 GB adress decoding in your system bios.  Once you do this if you want to use DirectGMA, you need to make sure you have a system bios that allow you to control where you place MMIOH Base = 256G or 512G, and MMIO High Size = 128G or 256G.   The BAR address need to be located between 32bit < BAR < 40bit.    Note if your using i5,i7, Xeon E3 you can only address 39 bits of Virtual Address space,  the issue will be you can turn on above 4GB address on these system but have limited control where the memory is placed.   On Xeon E5 v3 system they bios have more controls where to place your BAR range.   There is second reason we need this to bellow 40 Bit VA range it so we can see our DMA engine on GPU when using DirectGMA.

0 Likes

We are also working on a purpose built solution for GPU Compute Platform which support the  headless server market we call the ROCm Platform ( aka Boltzmann Initiative ) . We have this running in house on 16 GPU today.  We have a  PCIe breakout box in house to test 32 GPU, but I ran into server system BIOS issues as I mentioned above.   Any thing above 8-12 GPU you really need to work close with a system integrator or the motherboard vendor to make sure you have System BIOS that works. 

ROCm  is a highly optimized driver and runtime which to support HPC and Ultrascale computing needs.  ROCm needs  a  Haswell CPU or newer CPU ( Broadwell, Skylake, so  Core i7, Xeon E3 v5, Xeon E5 v3, Xeon E7 v3,  it has to be a CPU device that support PCIe Gen3 with platform atomics )  and one of our  Fiji based GPU products: R9 Fury Nano, R9 Fury X, R9 Fury.  We are currently testing Ubuntu 14.04 or Fedora 23 since we need Linux kernel verion 4.1 or newer.  Once we hit version one we move to other Linux Distros. 

You can find more information at www.gpuopen.com and at Github.com at the following links

https://github.com/RadeonOpenCompute/ROCR-Runtime/tree/dev

https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/tree/dev

ROCm  supports the following capabilities currently.    Before you ask we are looking at OpenCL to run on this stack as well.

0 Likes

this looks awesome, but it would need to support R9 300 series GPUs, since that is what we are currently using for our compute clusters.

0 Likes

unfortunately the NoAMDXorg option does not work, fglrx still takes over xorg and writes its own .conf file, and also breaks non AMD GPU. Pretty sure the option is still installing the OpenGL drivers, or its simply not working (at least not with 15.10...maybe ill try with 14 if I get around to it).

I did try producing my own packages with the -NoXServer option, and that DID work in terms of not breaking the current intel GPU X server, but both clinfo command and the compute openCL program crashed...so I'm assuming the openCL drivers were not properly installed by building the custom package.

If you know how to install the headless drivers WITH the openCL drivers properly please let me know.

0 Likes

Try running clinfo as sudo.

Greg

0 Likes

Looks like it was an issue with 15.10...did a fresh install of 14.04 server and got it running openCL headless with the core driver. Issue I am running into now is that 7 GPUs are listed fine with clinfo, but as soon as I add the 8th one clinfo crashes. Pretty sure its not a BIOS MMIO issue since lspci returns all 8 GPUs correctly (I'm assuming if the 8th GPU was above the address space then it would not be listed by lspci ?).

Either way here is the kernal output of the error

[ 170.005741] <3>[fglrx:firegl_pplib_update_display_for_ocl] *ERROR* PPlib was - Pastebin.com

0 Likes

If your running with head there is 8 GPU limit with OpenGL. The PCIe Resource issue in BIOS happens around 12 GPU’s on today systems.

On thing turn on above 4GB address and see what happens

Greg

0 Likes

Yea the BIOS recognizes all 8 fine, its its most likely an issue with the driver/clinfo enumerating all 8 devices properly. Its running completely headless with zero x server. Could you pass that kernel output to the AMD engineers and see if its something on the driver or hardware end?

0 Likes

I need a little more info more

What processor are you using, what motherboard, which Catalyst driver version, What GPU Hardware are you using? I see your using Ubuntu 14.04, it would be help which version your using ( i.e. Did you update or upgrade it)

Greg

0 Likes

Ubuntu 14.04.4 Server, fresh install

15.12 Catalyst fxlrx-core non-x support version from Desktop

[    3.770497] <6>[fglrx] module loaded - fglrx 15.30.3 [Dec 17 2015] with 7 minors

03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Amethyst XT [Radeon R9 M295X Mac Edition] [1002:6938] (rev f1)

  Subsystem: XFX Pine Group Inc. Device [1682:9385]

  Kernel driver in use: fglrx_pci

Intel Pentium G3258

ASROCK H97 Anniversary  

XFX R9 380x

0 Likes

What motherboard are you using, and are you using a breakout box.

greg

0 Likes

ASROCK H97 Anniversary  with 6 PCI slots, we are using a PLX PCI 4 lane bridge to connect the all 8 GPUs

0 Likes

So finally got the system to run properly with headless and no xserver on ubuntu 14.04, but have no run into another problem. The system with 280x GPUs runs fine with 8 GPUs, but we are currently using 380x GPUs and none of them will post as soon at the 7th GPU is connected (works fine for up to 6 GPUs).

This is interesting as both cards are not that different from each other, and I don't see why the 380x's would be running out of address space sooner than the 280x's (if that is what is going on).

The exact same behavior is happening across multiple motherboards/chipsets (Tested H81, H87, and Z97).

Below are snapshots of the kernel addressing the 280x, vs 380x and they seem similar. So I'm really curious for the route cause of this issue and what is different between the BIOS loading on the two cards to be causing this.

280x

[    0.210403] pci 0000:05:01.0: PCI bridge to [bus 06]

[    0.210405] pci 0000:05:01.0:   bridge window [io  0xb000-0xbfff]

[    0.210410] pci 0000:05:01.0:   bridge window [mem 0xf7b00000-0xf7bfffff]

[    0.210413] pci 0000:05:01.0:   bridge window [mem 0xc0000000-0xcfffffff 64bit pref]

380x

[    0.189688] pci 0000:02:03.0: PCI bridge to [bus 05]

[    0.189689] pci 0000:02:03.0:   bridge window [io  0x9000-0x9fff]

[    0.189693] pci 0000:02:03.0:   bridge window [mem 0xf7800000-0xf78fffff]

[    0.189696] pci 0000:02:03.0:   bridge window [mem 0x40000000-0x501fffff 64bit pref]

0 Likes

Check if file /etc/ati/amdpcbsdefault is installed. I got crashes with clinfo if it was not present. If  it is not then extract it from normaly installed driver.

0 Likes
gstoner
Staff

If you want to build a system up to 32 GPU, I will only recommend the new ROCm stack for this class of system which we test at his Class of system. The latest system we built with Catalyst with FirePro S9150 was 16 GPU. Current BIOS we see even from major server vendor capped out to 12 GPU in the BIOS due PCIe Resources, we have to work closely with Mother board vendor and system integrator to even enable 16 GPU. Cirascale is who we worked with to enable 16 S9150 GPU’s. Here is Topology for it. If you want to have call about send Private email we can talk about your application.

Here’s the typical AMD GPU PCIe BAR ranges note we need to make sure the System BIOS has support for 32 card where they fail is MMIO BAR and Expansion ROM the system run out PCIe Resource

11:00.0 Display controller: Advanced Micro Devices, Inc. Fiji (rev c1)

Subsystem: Advanced Micro Devices, Inc. Device 0b35

Flags: bus master, fast devsel, latency 0, IRQ 119

Memory at bf40000000 (64-bit, prefetchable)

Memory at bf50000000 (64-bit, prefetchable)

I/O ports at 3000

Memory at c7400000 (32-bit, non-prefetchable)

Expansion ROM at c7440000

Legend:

0 Likes