Showing results for 
Search instead for 
Did you mean: 


Adept I

Threadripper 1950X Max number of GPUs

I purchase Threadripper with hopes that I will be able to use its massive computing power to run docker driven AI applications with each docker having its own GPU.

My plan was to run 10 or more GPUs at 1x using PCIe splitter. Unfortunately, any X399 motherboard I tried does not run more than 4 in a stable manner.

I test with following motherboards, all had the same issue, when 5th GPU is added they do not post.




Just to clarify all hardware(PSUs, GPUs, splitters, ram) has been tested outside of this build and I am able to use all 10 gpus on another machine

Is there a limit on X399 platform as to a number of GPUs? Can it be reconfigured?

34 Replies
Big Boss

proper, please take a look at this post on the ASRock forum of a user trying to get 8 GPUs to work but only getting 7.  My conclusion is that a BIOS/UEFI update is required to allow a larger memory space.  Unfortunately including a link seems to significantly lengthen the time to get my response posted while it is "moderated".  Hope this helps.  ASRock X370 Pro - Can't POST with 8 GPUs - ASRock Forums - Page 1

Enjoy, John.  


So far  I found that there is space allocation limit imposed by 32bit system for PCIe management, it is limited to 4Gigs and exceeding that causes problems.

There should be a setting for Above 4G decoding, this allows increasing 4G limit I see this setting in other boards but not sure if ASUS X399 has it, once the system is done for the day I will check and post an update.

It would be great to get someone from AMD to clarify if 10+ GPUs is something that is possible with their architecture, so I am not beating my head on the wall.


proper,  I cannot answer your AMD question, but I strongly suspect if you can get the needed UEFI update, the X399  will support many more than 10 GPUs.  The real advantage with X399 is the 64 PCIe lanes.  I suggest you spend some time looking at the forum named bitcoin, you may find a mining rig based on X399.  Also, please look at Newegg, there is a surprising number of MBs specifically for mining, (mostly Intel, sadly).  I found one MB via Google (Intel, mining only, ASUS H370) that supports 20 GPUs.  Mining MBs names tend to start with H.  I suggest you open a Support Ticket with AMD and all your MB vendors asking your questions.  Be  sure to ask AMD what AGESA version (latest I have seen is AGESA!V9.ThreadRipperPI-SP3r2- you need to support >4 GB PCIe memory map.  Ask all the MB vendors if or when they will support large PCIe memory map.  Please let us hear what you learn.  Thanks and enjoy, John.


This is most likely BIOS issue, and hopefully, an update can resolve it.

I mentioned this many times before, I am not mining, my use case is completely different and has to do with Ai inference with complex models.

Mining boards do not support 100gigs of ram and 32 thread CPU's, that's what I need. On paper, threadripper X399 is the perfect platform to do what I want, but as you can see it does not deliver, not yet at least.

I have AMD ticket opened and someone from AMD reached out to me on twitter and offered to help, something I really appreciate.

I will post an update once I have more info


Thanks, proper.  I knew your were not mining.  I just knew that miners were in need of many GPUs.  I also suspect that they do not need much CPU power as you do, so maybe not much interest in TR.  The first link I posted was a X370 running 7 GPUs.  The user was limited by the UEFI (4 GB) and waiting for an update.  I would again suggest you ask your MB vendors about the UEFI update.  It would surprise me at this date that AMD did not support it, but the vendors must include the AMD update (AGESA).  It is also possible that AMD does not get involved and only the MB vendors need to release the update.  I would suspect your system should support at least 7 GPUs.  Please describe your system in more detail and maybe we can get 7 to work.  Be sure to tell me how all your system is powered and the size of the PSs.  What POST code do you see when it fails?  Is 4 GPUs stable?  Thanks and enjoy, John.

EDIT: UEFI setting from the first link I posted:

Above 4GB.jpg

I do not have this setting in my TR UEFI and this user's (X370) does not seem to work, but we can at least see what it should look like.


I have a ticket open with Asus and it is slowly getting escalated.

To get to 7 GPUs I plugin 3 GTX 690 and one 980 ti, there is a chance it does not boot 25% but it will generally boot with 7 GPUs, it also takes forever to post.

Requests to one GPU will usually throw an error in the OS (ubuntu server 16.04)  "PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=400b"

This runs off 1600W Platinum Rated PSU, idle power is 280W and max I have seen is 450W. Again we are talking getting it to boot not even running it.  But when I run 7 GPUs on other systems power consumption is below 800W under load with my application.

Ram is 32GB 4x8GB 2400MHz , new and tested in another system, all components, GPUs, PCIe risers, RAM are tested in the current system that is running.


Thanks, proper.  I assume you are aware that TR is "officially supported" only under W10 versions higher that 1703.  Has AMD talked about using Linux?  Is there any way you can experiment under W10?  I assume you are using a 64 bit OS?  What post code do you get when post fails?  Hard to believe your power consumption is so low.  How are you measuring it?  Do any of your MBs have the "Above 4GB MMIO" settings?  Thanks and enjoy, John.


I am using 64bit ubuntu server 16.04  updated to latest packages. The main reason is that stable CUDA drivers I need are not yet available for 17 or 18.

My main issue is with getting this thing to post with GPUs, once I get GPUs to actually work on the board I will look at the performance and consider what options I have to get it to run better.

When this board fails it tends to cycle, meaning it keeps rebooting, it runs through codes and then reboot and starts over again. Sometimes it does stop and codes I have seen are 27, 94, 95

I currently have Asus X399 and it does not have the setting, but as you pointed out people posted that this does not help increase number of GPUs it can run. There are threads about similar issues on Ryzen chips

I am not running ongoing compute calculations, GPUs get datasets they need to process via specific model once they are done they wait for next dataset. I a system that distributes incoming requests across the entire array of available units but individual GPUs are not sitting 100% loaded. Power consumption is measured at the outlet where PSu is connected.


Thanks, proper.  All your codes seem to be associated with the PCIe.  Here is a DL link to the Aptio 5.x Status Codes PDF:

I know little about Linux, but would really like to hear how you make out, so please update the thread.  Thanks and enjoy, John.


Thanks for looking into this with me Misterj I am sure I will make it out in one way or another just hope that TR platform make it out with me =)

I am still genuinely excited about this CPU and what it can do, and hope that both motherboard manufacturers and AMD realize features like this are important to consumers that buy there expensive chips and boards.

So far everyone is telling me to drop AMD and get intel workstation board with dual CPU and it will do the job but I am going to give it some time, I am waiting on responses from AMD and ASUS.

ASUS has been very poor at handling this, I exchanged around 10 emails with them during week time and 3 calls - they said nothing useful, repeated same things and asked to fill out the same form twice. Promised escalation to technical team multiple times and then sent more emails asking for the same information or suggesting I rotate the graphics cards. This support staff is in the Philippines and they seem to be doing everything they can to stop this issue from reaching people in US who will be able to give solid response.


You are very welcome, proper.  I am very interested in this but can think of no excuse to make the investment in the HW to play.  I think it is good that you will follow this out with AMD.  Do you still have the other boards or just the ASUS?  Do you know how much MMIO memory is required by your video cards?  I will keep an eye out.  Thanks and enjoy, John.


I did find some interesting info about PCIe and how boards use them. It seems that main issue with GPU numbers support how motherboards assign addresses to devices, there is limited space available for onboard devices address and most motherboards seems to allow system onboard devices, like a sound controller and network controller, usb controller among others to get devices address assigned before PCIe devices get their turn. This means that PCIe devices get addresses in whatever space is left in the "buffer" and it is clearly not enough for more than 7 devices.

To test I disabled network controller and sound controller and was able to post with 9 GPUs, I think this confirms that its an address allocation problem.

Quality boards manage address allocation for onboard devices differently to make more address space available for user connected devices.


Thanks, proper.  Here is the screenshot from the 8 GPU thread I pointed to above.  W10 enumerates the exact addresses used so you can see all the junk given space.  Over half the space is not GPUs.  I suspect Linux has similar information.  The bottom line is all 4GB is used in the system with only 7 GPUs.  This is why the "Above 4GB" UEFI setting needs to exist, be enabled and WORK!  This, of course, requires a UEFI change.  Have you heard from AMD on the "Above 4GB" setting?  I could be wrong, but I suspect the MMIO area allocation is an OS function.  Perhaps you know a Linux programmer that can get rid of more of the junk for you.  Thanks for posting, please continue so I can keep abreast - love learning.  Enjoy, John.



Keep in mind that this error happens before the bios even get to preparing to load your OS, you may as well just unplug the hard drive(with OS) it won't make a difference, because the system won't post, it is not functional.

I do not believe this is related to "above 4g decoding"

Above 4g decoding the way I understand it is how VGA devices are mapped in RAM.

OS needs to allocate memory for where graphics cards data will be stored and it allocates a maximum of 4gb under 32bit limit.

Enabling above 4G decoding allows the OS use 64bit address map to allocate more RAM for PCIe devices.

Problems caused by 32bit address map would present themselves by PCIe devices not being detected in the OS or disappearing during use because the system would run out of 4GB ram that was allocated for VGA. This means the system would actually work.

However, the issue I have seems not to be related to RAM allocation limits and presents itself before bios completes loading its systems.

When motherboard starts it begins creating a map of devices it has onboard so it could communicate with them, these addresses are stored in the special address "buffer" which is very tiny and is designed to just hold device address data. Many onboard devices get address allocated in the "buffer", with remaining space left for peripherals, PCIe devices. That space is limited and what happens in default configuration there is enough space left for only 7 addressable devices. It really does not matter what those devices are, they could be M.2 drives or network cards or GPUs, bios needs to put device address in the "buffer" so it could communicate with this each device and it runs out of space.

When I disable the network adapter and sound card on the motherboard, they are no longer initialized and address space they used to occupy becomes available and I can plug in two more GPUs

This is my layman's explanation of what I believe the issue is, considering results of my tests I think I am close to the truth. Must say big thanks to kind people at amfeltec for shining some light on this.

Main problem is still that Threadripper can support maximum of 7 PCIe devices(although there are issues at 7), it has nothing to do with GPUs, you can connect 7 network adapters and 8th won't work

Motherboards that support more PCIe devices go around this issue by using larger address space or moving onboard devices to different address space, so peripherals have more address space.

This can only be solved by BIOS update that gives that capability. I hope manufacturers of motherboards will solve this.

ASUS support has been complete garbage, I spoke with a lot of people working on my case and keep telling me the same thing and have not escalated this to the department that can actually respond. I am going to have to go after their marketing people on social media and send a letter to the corporate office to get this moving


Thanks, proper.  You are absolutely correct about not posting versus not seeing the GPUs.  I forgot that aspect.  Now I suspect two different things are going on.  One concerning only the UEFI and one the OS and UEFI.  Perhaps the bitcoin forum can help.  I have a sign-on and will post a question there and see if we can get some answers.  Perhaps this explains why the "Above 4GB" does not appear to work.  It would allow more MMIO memory but not permit the UEFI to process all the cards, so the OS does not get loaded.  I'll let you hear.  Enjoy, John.

Big Boss

proper, I stumbled on to this:


It is a utility from National Instruments probably used to see if a customer's system will support their MXI-Express product.  This is what my systems reports.  I did not see a Linux version, but you may want to take a look anyway.  If you Google MXIeBusDetect you can find a page with an explanation and DL.  I suspect these two areas are numa node 0 and 1, one for each processor chip.  Enjoy, John.


I have some progress with this and got 12 GPUs working. System also stopped crashing if more are added it simply does not detect them at all, which probably means VGA buffer is out of space and I need "above 4G decoding" to keep going.

I was able to get there, by disabling everything I could including USB controller. I also found that last PCIe port at the bottom of the board could not be used and always results is errors. I think it has issues creating Group for devices when that port is in use.

There was also another setting on the board that I found clever, called "Enumerate all IOMMU in IVRS" my understanding is that this setting allows both chips on the CPU to participate in PCI address allocation, without this setting only one CPU of the two manages PCIe.

This maybe the reason for this issue but I am too tired to do another reset and try to just test that setting alone. I will run this test later

in conclusion I think that if "above 4G decoding" becomes available I should be able to map more GPUs and get to 16 or more, but 12 is already a good number so I will keep the system.

Both ASUS and AMD, support is not  very supportive - bit disappointed, two weeks go by and they could not suggest anything but buying mining board.

Thanks, Proper.  The only useful response on Bitcoin:

ASUS B250 Mining Expert m/b will do 13 GPU's and another 6 P series cards for a total of 19.  They can be had for around $100 new and $50-$70 used on Ebay.

I do not know what P series cards are.  I will ask some question of the responder.

You are making good progress.  Enjoy, John.

EDIT: Here is the BIOS update video for the responder's Mining board - Interesting.


For some reason, I was not able to post in this thread for few days. Said it was blocked, moderators said they never blocked it. Bugs I guess

Nvidia has Quadro lineup for workstations that do CAD mostly and need specialized GPUs. Those are called Quadro P, you can lookup Quadro P4000 on amazon

At this point, I am convinced that motherboard address allocation is not an issue. What happened was all USB ports gain an address in the buffer and if you count expansion options for USB headers, there are over 20 address spaces.  Once those are loaded there is not much address space left. So you disable USB controller and gain over 20 ports. But cant connect a keyboard - bummer.

, so you need to use USB expansion card or I will try to get it to work with just enable USB 3.1 controller which is separate and probably has 6 ports total.

Bottom line is I can get it to post and boot with 16GPUs but OS will not see past 12 and I hope that is 4g decoding issue. Asus did not implement 'above 4g decoding'  on this board yet but Gigabyte and other manufacturers did. I opened another ticket with Asus to get status on that but their support is complete garbage when I speak with these reps they employ in the Philippines its clear they have layman level experience with an actual motherboard and simply rely on guides written for them to go through the steps.  I have a ticket open with them for 3 weeks, nothing was done, I requested that manager call 3 times, never heard from them. Last week they have just been ignoring my request, to sum up - their support is useless and at this point, I strongly suggest you stay away from Asus if your system is in any way out of the norm, they just won't help you.


Thanks, Proper.  You are making great progress!  I saw your post on the ASRock forum.  Sorry about your not being able to post - wonder if others are having the same problem.  Does your board have a PS-2 KB/Mouse port - mine does?  You could also consider an RDC (Remote Desktop Connection).  I have seen several complaining about ASUS support.  I had some support trouble with one company and wrote the CEO.  It was fixed really quickly!  The post on ASRock's forum about AGESAs above supporting Above 4GB has no meaning because the various groups of processors have different number schemes and, of course, just having a supporting AGESA means nothing unless the MB vendor supports it.  I appreciate your keeping us up to date and I will check out the P series.  Thanks and enjoy, John.

EDIT:  Please take a look at the Bitcoin thread I pointed to above.  It has a couple more entries.


Proper, please take a look at my Bitcoin post.  There are some new posts.  Enjoy, John.


Which post?


Sorry, here.  Enjoy, John.


proper, I just tried BIOS/UEFI 3.10 (AGESA for my ASRock Fatal1ty X399 Pro Gaming with Threadripper 1950X.  I had to remove it when I had boot difficulties - ASRock had already removed it from the DL site for an undisclosed reason.  On the UEFI Boot page is the option "4G Decoding" which defaults to Disabled.  Enjoy, John.


I think 12-13 GPUs is a Windows limit. Which ASUS board are you using?


manylines, I think that limit is caused by the memory allocated (32 bit, I think) to MMIO (Memory Mapped IO).  Newer boards now have a UEFI setting (something like 'Above 4GB') allowing MMIO to go as far as 64 bits.  I think some block chain miners have rigs with more than 20 GPUs.  The Bitcoin forum discusses these machines.  Enjoy, John.


I posted that Ubuntu is used not Windows.

I was able to get 14 and even 15 GPUs working on Threadripper platform however it is very unstable, rebooting the machine can cause problems and you end up spending an hour playing with it to get it to work again, it's not acceptable. Best performance I had was with 12 GPUs and even then it was bit shady, sometimes it would reboot multiple times before it posted and booted up. Issues come from bios and noone of the motherboard manufacturers cared about my issues when I reached out to find solutions.

Threadripper is workstation CPU and there are no workstation grade motherboards for this platform, all we have are gaming boards with a bunch of lights and lack of any support. I decided to skip on Threadripper platform as enthusiastic as I was and current CPU and board will be used for hosting some VMs. I will update how that works out.

I went with quality Intel workstation boards and comparable 12core 24thread CPUs, they did run more expensive and they handle same 12 GPUs each but they are just rock solid, you turn them on and they just work - every time, no fiddling with settings - no disabling USB controllers no rolling the dice.

Journeyman III

Hello, I have the same problem, I have a motherboard Asrock Taichi x399 and I am trying to run it with six GPUs but the system does not start, displays error 39, technical support can not help with anything, and I would like to ask what kind of motherboard you have from ASUS, maybe at least on it I can run 6 GPUs



Greetings from England!

I'm just about to go down the same rabbit hole as you have with the Taichi so would really really really appreciate it if you could keep me/us updated with anything you discover, learn, conclude, etc.

I don't have the luxury of being able to get anything wrong with no 2nd chances so can not thank you enough in advance if you can help.

Warmest regards.


Hey bucher and mawvius, I did post the solution above but will go over it again, I hope this will help you both.

The reason Threadripper has problems running large number of GPUs has to do motherboards device address allocation space.

Basically, bios detects all addressable devices on the board(NIC controller, Sound controller, Usb controller and ports, etc..) and stores their address in special memory space, which as it turns out is very small and any consumer installed devices are always at the end of the list, so when memory space runs out they "fall off" and are not visible or they do not "fit" and system runs in to an issue producing variety of errors and halting boot sequence.

I discovered this after many hours of testing and nether AMD or any board manufacturer were of any help.

To go around issue you have to disable onboard devices, so they do not take up address space making it available for PCIe peripherals.

Most important - Disable all usb ports as each one eats space, your can leave 1 or 2 on to control the machine.

Disable Sound card

Disable USB-C controller and anything else you can.

During my testing I tried verity of boards but got best stability with  "ASUS Prime x399-A".

I suggest avoiding MSI as I could not get more than 3 GPUs detected no matter what I did and board died after few hours of tests, I have no idea why but I never tried MSI again.

Maximum number of GPUs I was able to get detected in Linux was 15, I had to disable all devices and usb ports and use PCIe Network Card instead of onboard one. However there were stability issues and sometimes gpus would "fall off" after reboot, there were issues with addressing them as well, sometimes they would become unresponsive.

12 GPUs had much better stability but still sometimes would begin to throw errors under load. You would see address related errors right in the CLI. It was hit or miss. Again, you have to disable everything.

10 GPUs was stable but with all usb ports disabled and only Ethernet enabled.

I settled on 8 GPUs per system because it was incredibly stable and there were 4U Rack cases and power solutions that can accommodate my workloads and wrapped the system in to a neat compact package. I was able to use onboard NIC and one usb port.

As a side note, I transitioned to AMD EPYC CPU (Server version of Threadreapper) running on Asrock motherboard.

It is allot more expensive but being server platform with remote access ability it is allot more stable, convenient and obviously allot more expensive.

I will add that Asrock support has been very helpful and even sent me Bios chip for free without any questions when system froze during bios flash, small gesture but much appreciated.

I hope you guys get this working, I will check back in few days.

Journeyman III

Hello everyone, thanks for the answers, the problem with the Asrock Taichi x399 has not been resolved ( Asrock support is very strange , they said that there is not enough power on the PSU, although the power supply is set to 2000W) disabling the devices did not help either, and the motherboard was delayed until better times, in exchange it was bought ASUS STRIX X399-E GAMING, it was able to run 8 GPUs on it, but the problem is that unlike Asrock Taichi x399, ASUS motherboards do not have bifurcation on x8x8, but only on x4x4x4x4, which does not suit me, the PCI-E speed is too low, so now I’m considering buying a ROMED8-2T with AMD EPYC CPU, although it’s very hard to buy

Journeyman III

Hey @proper - I know you've pretty much resolved your issue now, but just wanted to try to clear up something about '4g decoding' (some of the images misterj posted above are actually ones i posted on the asrock thread he mentioned.)  It sounds to me like you are actually describing the exact problem that 'above 4g decoding' aka 'above 4g addressing/mmio' is solving.  The EFI/BIOS is mapping the registers and maybe some portion of onboard ram from each device to system memory, to allow communication between them.  As you can see in my Windows device manager shot above, that resulted in over 256MB per GPU in my case, which w/ 8 GPUs is already 2GB.  As a normal system could easily map 3+GB with all the onboard devices, you can see how one w/ multiple GPUs can definitely go over 4GB.  The problem is that a normal mapping table is only 32 bits, which means it's limited to addressing 4gb.  So to address higher memory locations, the table needs to be extended to somewhere between 32 and 64 bits (the limit on modern consumer hardware/OS').  Again, in my UEFI screenshot, you can see that the setting is 40bit, allowing for addressing up to 1TB.

Your fix (and mine in the asrock thread) was basically to free up some of the standard 32bit mapping space by disabling other devices (network, usb, audio) which would be using it otherwise.  But enabling this 'above 4g' option where available to increase the mapping space / addressable mem is definitely an (arguably better) alternative.  On my Asrock board, even though they had exposed the option, it wasn't working, or needed a complimentary setting which wasn't made available.  In the end, it wasn't necessary in my case, as I only had 8 GPUs, and was able to disable enough unnecessary onboard devices to free up the required mapping space (like you).

Unfortunately, I'm running into this problem again on a new ASUS x570 board, where the option is not available, and sadly, even w/ everything disabled, I still can't boot properly w/ more than 6 GPUs.


You have to remember that 4G decoding was created to solve specific legacy comparability problem, present only on computers using 32bit operating systems, that can also have GPU with more than 4 gigabytes of memory.

One of the big relevant limitations of 32bit architecture is limit on memory address space allocation, because of this limitation, 32bit OS can only address 4gigs of memory space.

Most know this means that you can not have more than 4gigs of RAM but this limitation extends to working with memory address spaces in general.

if you had 32bit OS and install GPU with 6gigabytes of ram, or installed multiple GPUs that shared combined memory space larger than 4gigs the OS would not be able to map that memory space. Depending on the implementation; GPU may not be detected at all or not all GPU memory will be detected.

To solve this issue, 4G Decoding was implemented to allow 32 bit architecture to address memory spaces in GPUs that had more than 4gb.

The only time you should use 4G Decoding is if you run 32bit OS and have GPU with more than 4GB of RAM.

It is very uncommon to see 32bit OS in 2020, this is "legacy feature".

Issue with detecting larger number of GPUs is not on OS level, it is a low level issue.

When motherboard starts is detects all devices connected, each device is assigned an address, list of these addressees is stored in small a chunk of allocated memory. Size of the addresses are very small but space holding them is also very small, and it is possible to completely use up that space, because GPUs are not single devices, they have large numbers of sub components that may get system address, it is possible to saturate this address space., when that happens system does not post and reports "E3" or other error that indicates address allocation issue.

I have seen companies releasing GPUs without any ports at all, I assume they do this at least in part as they strip down components to reduce power usage and address space usage.

That is why disabling devices works.

Threadripper also has limitation in architecture, only one of the chipslets handles PCIe lanes directly.

Server grade alternative version "AMD EPYC" does not suffer from this problem and there are settings to specifically establish sharing of PCIe lanes between both chip lets.

Some things you can try with 570

Not all ports on the motherboard are wired the same, you have to find those that have bandwidth priority and do not share data links with neighboring ports.(Common in consumer motherbaords)

You can use splutters and only connect to those 2-3 ports that have direct link.

Make sure those ports are set link speed to 8x8 for 2 gpus or 4x4x4x4 if you use 3 of more.(this is critical)

Disable all USB peripherals, and possibly network adapter as well, and use PCIe one.

Chips that manage USB use up massive number of addresses even when ports are disabled.


Thanks for the suggestions.  Unfortunately, I specifically chose a board which by spec allows all interfaces to operate in parallel, and already run all devices in x1 mode, so the number of PCIE lanes is not my issue.  And I've already completely disabled USB, sound, serial, etc (even tried enabling one USB for OS, and disabling all SATA.)  It's simply a memory problem (not physical - I have 32GB installed), which is actually confirmed by the fact that with enough devices disabled, I can actually boot w/ 7 GPUs, but then grub gives an 'out of memory' error when attempting to load the linux kernel.  Only thing i've never disabled is the onboard NIC, as i don't have another splitter, or pci NIC.

To solve this issue, 4G Decoding was implemented to allow 32 bit architecture to address memory spaces in GPUs that had more than 4gb.


The only time you should use 4G Decoding is if you run 32bit OS and have GPU with more than 4GB of RAM.

It is very uncommon to see 32bit OS in 2020, this is "legacy feature".

These are common misconceptions, which is what my previous point was trying to dispel (w/ the assistance of my screenshots posted above.)  Many people seem to think that if they have 8+GB GPUs (with or without a 32bit OS), then they need 'above 4g addressing (decoding)'.  But it's uncommon for a modern GPUs full onboard memory to be mapped - you can see in my screenshot that it's more on the order of 256MB for my 8GB RX580s (a little more since some additional hw on the card was mapped.)  Some GPUs like the P100 w/ huge amounts of onboard RAM likely have a larger amount mapped, which is why it was recommended/required to use the 'above 4g' feature for that card.  Again, w/ all the existing onboard devices enabled, it could be that < 1GB of mapped add-on device memory will overflow the 32bit/4GB limit.

Beyond that, we are generally saying the same thing, but in different words.  I wasn't suggesting that the difficulty with multiple GPUs was an OS problem - quite the opposite in fact.  Above 4G addressing ('decoding') does not solve an OS problem per se (that is tangential.)  The feature extends the number of address bits from the usual 32 to something greater, which allows an extended address table, and therefore enables using high memory > 4GB for mmio.  This is completely independent of the OS, and you should be able to witness the problem just by launching a UEFI shell to inspect the mapping before ever loading an OS.  Btw, a 64bit platform (or OS) does not automatically solve this - my setups are all 64bit (platform + OS), yet nothing is mapped above 32bits on these systems by default - that is the true legacy 'feature' (annoyance).