cancel
Showing results for 
Search instead for 
Did you mean: 

Graphics Cards

ryzen1988
Adept II

Triple pro duo setup not booting

Hey guys, i am stuck with a strange problem that is probably not even the fault of the cards themselves but i hope maybe someone here has and idea how to troubleshoot next.

I wanted to have 3 pro duo Polaris in a server so it would have 6 physical GPU's.

All the cards work individually, they all work as a set with one other, they also work in all the slots.

But as soon as i install all three it wont boot.

The motherboard the asrock x399 Taichi with a 2950X just keeps cycling boot codes in a continues loop.

I already contacted there support but no response as of yet.

Cant get a screen nor bios so its stuck in the pre boot check.

Power is not the issue, i switched the PSU with a brand new 1300w.

Same problem.

if i install 2 pro duo's and a wx4100 it works, but if i take a third pro duo it goes haywire.

Hope someone has an idea what to try out next.

0 Likes
22 Replies
ryzen1988
Adept II

I have fixed the problem, just a quick recap for anyone running into the issue.

According to what i have read it has to do with the pcie address space that by default only ran to 32bit on this motherboard.

The pro duo cards have a pcie switch and the two gpu's it self so they consume some address space.

Also the board was crammed full of nvme and other stuff, so the third card could not be addressed.

You have to tweak the following settings in bios if you run into this:

Disable CSM

IOMMU on

Above 4g decoding <------ most important one to turn on.


After that it worked like a charm

How cool is it to see the following line in the software:

running the test data set on up to 32 CPU cores and on up to 6 GPUs

0 Likes

I am surprised that there is even a switch. Should be enabled by default so that the full span of a 64-bit system can be properly leveraged.

0 Likes

When i was reading into the matter i thought it was strange as well, especially on a HEDT platform like x399.

Only downside i could find was less compatibility with old, old hardware.

So stupid choice on X399 to have it off by default on the Taichi.

Also have not heard anything from Asrock support so with future troubleshooting its finding out myself i guess.

At least the boards reflect it in price compared to some of the other brands.

X i wanted to make a correction to this comment of mine.

Asrock response took a couple of days but they went so far as to test the mobo in lab with 3 tesla k80 dual gpu's and confirmed my fix.

So thats actually really cool of them and my previous statement was to quick in judgement.

0 Likes

MY RX 480 has 8GB of VRAM and I am liking 16GB VRAM cards more given my testing of recent 64-bit games.

0 Likes

To be honest, you will much earlier run into the limit of the GPU than the Vram. 

More than 8gb vram is only really a bit required in some heavy titels on 4k, that is more than what you can do with the RX 480 gpu.

Besides these pro cards are not really gaming compatible and on top of that its a dual gpu, and most modern games don't properly support this any more.

Although i do believe they added some swap function that you can switch between normal drivers and pro, but i have not checked that out.

The Vram is nice because it allows me to work on large video projects with all in vram and not have it spill over to main system memory and jam up the whole system.

Also better open CL support.

0 Likes

I figure that 16GB cards will surface next year for 4K gaming. GDDR6 is less costly per GB

0 Likes

Yea probably so, but the GPU cores need to be a hell of a lot quicker than they are now.

Otherwise its just useless storage.

0 Likes

GPU core speeds have not kept up with CPU speeds mostly due to the problems of designing large chunks of silicon.

My RX 480 has 36 of 40 CU working. The Vega 64 is a good base for chiplets.

nVidia is working on the idea of 256 CU using chiplets

0 Likes

Hmmm, I hope you're right but i think GPU's are a lot harder to chop up into chiplets than CPU's.

Maybe tearing of the controllers and some cache of to a separate die could be an option, but separating shader cores to different chiplets would result in a crossfire like solution with dies, that nobody would be happy with.

GPU die bandwidth is so much higher than a cpu, and also shader cores are much more intertwined in processing than cpu's because of the inherent type of processing and sensitivity to latency.

It would only work if the os see's a gpu chiplet as a coherent GPU, and i don't think interconnect technology is that far that this can be done without great inefficiency and loss in performance in areas like gaming.

In compute maybe.

(probably why there are lots of whispers that blue team's gpu in performing way below expected efficiency)

Would love to see a 8 core zen CPU chiplet+ IO die, navi GPU , and HBM stack as L4 cache on the same chip 

0 Likes

Unlike Apple which has so many leaks on new products feeding the media moguls, AMD and nVidia are more speculative.

Some time ago a research paper was published back the Pascal was still a new technology and it focused on the problems of developing logic for supporting 256 CU. This is the basis for speculating that nVidia may be doing a chiplet and the same for AMD.

Zen 3 is close at handy for the Ryzen 4000 processors. AMD may also introduce new chipsets for motherboard vendors but the X570 I have now is doing its job.

AMD has some experience with chiplet designs so this will allow some incremental improvements.

0 Likes

Yes i know exactly what white paper you mean, i have read it fully.

To be fair i also heard and know Nvidia has build a custom silicon ai accelerator that uses this many chiplet architecture.

That is also why i think this will not be a 'normal' gpu, but a custom AI accelerator aimed at datacenters and hyperscalers only.

I hope i am to pessimistic about the technical feasibility of the moment to have a functioning GPU chiplet architecture  and you are correct in assuming.

But the years of sli and crossfire and even dual GPU cards have made me a bit skeptical about the technical challenge to overcome in combining multiple gpu or gpu parts as a whole.

Datacenter and scientific loads i believe could be done but the big gaming market i just don't see that being possible without having to deal with all the negative downsides of multiple gpu like problems

But of course the never ending pace of innovation will get us there, its just a matter of time.

And the chiplet design is especially in cpu is a great and awesome innovation from amd

Also have you noticed that blue is always talking about there great interconnect, first emib and now fovoros, with there silicon die interconnects. (there much less perfect then they tell you because of the small silicon bridges often they dies on top get a bit uneven because of the bumb of the bridge resulting in yield losses)

I dont recall having amd bragging about the physical zen interconnect? just about the infinity fabric that is truly next level fabric, and i wish them all the luck in steaming into the datacentre to higher shares than the opteron days.

0 Likes

my last report on TSMC looked at yields

TSMC 5NM 50% YIELDS – HARDCORE GAMES™ 

nVidia has been working with Samsung, Apple and AMD etc are using TSMC

0 Likes

Yes, i am aware of the foundry distribution.

Mobile will of course get first dips on the new lithography and also takes the first big hits on yield.

Its a shame AMD is still under very heavy contract with GloFo to buy a minimum of wafers every year.

Maybe power and IO does not benefit as much as cores from smaller lithography, just the die area of the central IO die is massive, if only for the amount of surface.

But unfortunately got to buy those from GloFo, otherwise they have to pay bucket load of money.

The Nvidia situation is a bit strange, they already produced some low end cards before with samsung, and the samsung foundry's are right behind TSMC in implementing EUV.

But in latest earnings call they kept awfully quit about getting to the next node, or a architecture roadmap for that matter.

They very silently refreshed the Tesla V100S, but the v100 in principle is like 2 or 2,5 years old if i am not mistaken.

Should have been a followup by now, so i think something is not going wel within Nvidia and next gen development.

There attention and focus is so divided now across self driving cars, ai, gaming and a whole bucket load of stuff.

And the one architecture for all never works out.

Its crazy because they get a little bit cocky with themselves but gaming is still by far there biggest revenue.

0 Likes

All I know is that silicon woes are as much of a problem as anything. The demand for better purity is insatiable.

My R5 2400G is a monolithic device but the 3000 series are chiplet. My GTX 1060 is 16nm and my RX 480 is 14nm. At present there are no extreme video cards make at 7nm yet above the Radeon VII.

Thermal management of chiplets may be more difficult to ratchet up speeds much. CPU clock speeds have stalled as much as GPU clock speeds.

0 Likes

In principle it would be more easy to make a gpu than cpu on a new process, the die size might be bigger on average gpu but the transistor density on a cpu is much higher. A gpu often has a lot less transistors per mm2, and because of a large die the heat is also less dense.

This is the reason that a gpu does fine with paste and little gain with liquid metal, because it already has a large surface.

Cpu's have a much more dense construction and heat spots, thats why the solder with cpu is by far the best way to go.

And if you cant go faster, go wider.

Also the nodes from foundry's are not directly comparable, those sizes are just marketing terms.

To be fair i had a Radeon VII, sold it within 6 months. Awesome compute card but very hot, loud and unstable for 24/7 usage.

I dont hold raja in high regards, probably a nice guy but i think hes not the guy you want in the leadership role.

I dont think team blue gpu will be anything good.

With all the XE gpu talk from him, i always think back to...... Poor volta

Lisa su made the right call investing the little money there was in CPU/ryzen development, and she keeps all in good order.

Now that the sky is the limit investment and knowledge will be returned to gpu's as well.

With the upcoming exascale computers there is also massive funding for the radeon instinct software ecosystem, this will eventually also trickle down to consumer stuff.

0 Likes

I considered a Radeon VII but they were is very short supply. So I bought a bunch of motherboards and hard disks etc.

I noticed Radeon VII also was a poor gaming card. That is the dagger in the card's heart for me.

I use MX-4 to repair thermal problems, so far it has been good at its job

Flakey drivers have made it impossible to use my RX 480

0 Likes
ryzen1988
Adept II

I started to use liquid metal as TIM in laptops, insanely efficient runs 20 degrees cooler in an instant.

Now days do everything with it, laptops, cpu, gpu. Just as easy as paste but always more efficient

pastedImage_1.jpg

0 Likes

liquid metal is set in stone, but for some purposes it does have its place

i repaired a gigabyte gtx 750 2gb gddr5 with mx-4 and it now no longer thermal throttles

so far none of the laptop machines I have overheat, then again I clean my machines frequently and I have lots of air purifiers too

0 Likes

Its not so much overheating that i'm experiencing.

For example my Dell XPS 15 in max config does not overheat, but it does throttle quite a bit when its placed under heavy load on cpu and gpu, like all fancy thin and light laptops with serious hardware specs.

With liquid metal and a little bit of under volting it can sustain its max turbo freq all time without throttle, and keeping cooler than the standard config with less fan noise. This also saves a lot of battery power since fans need to spin less often and slower.

On my Nintendo switch it gave me around 20% longer playing time on battery cause of the much more efficient cooling.

0 Likes

I have not seen any Dell XPS rigs as I have built my own box for decades

MX-4 also repaired an EVGA GTX 260 SC which was overheating. The card now works fine but it is obsolete.

0 Likes

Sry for not stating clearly, The dell XPS is a Laptop, Thats what i mean with thin and light.

A pc is a different thing, those i build myself to, laptops i do upgrade.

0 Likes

I had to change the BIOS on my old Lenovo laptops to be able to replace the non-compliant Intel WiFi cards with low cost 802.11ac from Realtek

Only the X230 has the BIOS locked down with a certificate was not able to remove the whitelist

Whitelists are illegal and violate law in many nations including the US and Canada

0 Likes