0 Replies Latest reply on Mar 22, 2018 4:58 PM by crossfire64

    Vega 64 mGPU (heat) Crash issues

    crossfire64

      Hello AMD Community!

       

      i recently got myself a second Vega 64 for some Crossfire / mGPU action. But as soon as i installed the second card on my mainboard i run into more and more problems. First things first. Guys, if you want to build a Crossfire system read my story. I think i made every mistakte you can do. If you just want to help me with my issue, please scroll down to my summary.

       

      Initially Hardware Build:

      Ryzen 1700x

      B350 Prime Plus Mainboard

      32 GB RAM

      650w PSU

      1x Vega 64

       

      1) i got a chance to get a second used Vega 64 for cheap, so i startet researching if its worth the money and can be run by my computer. It turns out that the B350 does support Crossfire / does not support Crossfire - depending on the source you are reading. After i saw videos on Youtube displaying Vega crossfire systems on the B350 chipset i made my decision: "get the second Vega, its running!"

       

      And yes, it runs... as long as you are not going start some games with it. The cards were simply on a performance level even worse than my 10 years old office laptop.

      Please AMD Support, correct me here if i am wrong here:

      It turns out that the B350 is not even theoretically build to run crossfire, the second PCIe Slot just dont has the amount of PCIe Lanes to get the throughput a crossfire System needs. So what i think what happened was: the two cards were running in some kind of compatibility mode on the lowest speed they both were synchronus able to archive with the low amount of PCIe Lanes, causing the grafic performance to break down into Pentium4 levels.

       

      Lessons Learned #1: dont go for Crossfire with one PCIe Slot running on PCIe 3.0 x4 / B350 Prime Plus Mainboard

       

      2) well, now i had two powerful Vega 64 and cant use a single one of them. I basically unplugged the second one from the PSU and left it deactivated on the mainboard. In this way at least one card was able to performe as expected. A few days later i upgraded to a new Mainboard. After some research the only one which sure supports crossfire was one with the x370 chipset: so ASUS x370-PRO it is.

       

      2nd HW Build:

      Ryzen 1700x

      ASUS x370-PRO Mainboard

      32 GB RAM

      650w PSU

      2x Vega 64

       

      After i rebuild the entire case with the new mainboard i went for a test. Come on Witcher, i know you can performe great with crossfire! BANG!

       

      Lessons Learned #2: one Vega consumes about 300w. A second will consume another 300w. The rest of the PC clearly will need more than the 50w left my PSU can make.

       

      3) i simply spent to much efford on getting this system running to give up on this point. I looked up on the HW specs, found out that you should grant the Ryzen 1700x + Mainboard about 150w, i already knew the Vegas are running with up to 300w per peace. And then there are a few hard drives running, but they dont need that much power. QuickMath: 150+300+300+50 = 800. I found a cheap and nice 850w PSU and got it a few days later.

       

       

      3nd HW Build:

      Ryzen 1700x

      ASUS x370-PRO Mainboard

      32 GB RAM

      850w PSU

      2x Vega 64

       

      I build it in, another time i have to basically rip the build apart to get the new cables in. But finally i got all i need, finally i can run the damn Witcher in 4k@60 or even more. Open it up, go for it, played 30 seconds... BANG!

       

      Lessons Learned #3: do not underestimate the thirst of two Vega 64 running under Crossfire.

       

      4) to be honest, i was confused. I couldnt find the problem for quite a while. I did the math, but it simply was not enough power. But how much could it take to get it running stable? Thanks to reddit there were some guys with answers: https://www.reddit.com/r/Amd/comments/7ev3wn/crossfire_v64_on_a_1000w_psu/

      So, according to the internet you should get at least a 1100w PSU. It is far to late to give up right now. I found a 1200w PSU and got it another day.

       

      4 HW Build:

      Ryzen 1700x

      ASUS x370-PRO Mainboard

      32 GB RAM

      Corsair HX1200i PSU (1200w)

      2x Vega 64

       

       

      At this point i want never be reminded about the money it took to get that far. But on the other hand, its too big to fail right now. I went for another test, a little scared. Come on Witcher, you cannot run from me! It works, finally! It runs that super smooth i heard tears of joy in my eyes. It so super smooth... for about 60s. At least an improvement by factor 2!

       

      Lessons Learned #4: Damn is Vega creating a heat!

       

      5) My case is already equipped with 5 fans, every spot you can place a fan has one equipped. But this doesnt bother the two Vegas. With the WattMan equipped on a second screen i was able to see increadible jumps in the temperature. From 30 degrees to 85 in under a minute. I was not even mad, i was impressed a little bit.

      But thankful WattMan provides costum fan settings, you can even create single profiles of various settings for multiple games. And thankful, Vega fans are that loud that AMD is selling them with less with than not even 50% speed they can run. They can run up to 5000rpm, and were throttled at 2400rpm. So i created a new profile for the Witcher, and i am still using it.

      AMD Support - please give some feedback here:

      Witcher Profile: Min: 2000rpm Max:4000rpm / Temperature Target: 70 Max: 85

      Can this setting damage the cards? I just made the fan run faster, and set the Target temp even down for 5 degrees less. I will not dare touch the max temperature.

       

      Good Hunt Witcher! Get them now! I start the game, and actually it runs as smooth as expected. I already got the game running with this settings for 1-2 Hours. But BANG! It still crashes sometimes.

       

      AMD Support please correct me here if i am wrong:

      Lessons Learned  5#: i made a bunch of experiments disabling / enabling crossfire because i suspected some serious hardware problems at this point. The first thing i noticed was "you were already playing that game with one card, without any heating problems, but now even a single card will run into the 85 degrees and will turn off immediately". What can cause this to happen? I think its something with the power states the GPU can have. I THINK (not know) that crossfire / mGPU affects the power state in a way that the GPU cant throttle itself any more and will run permanently in the highest power state with maximum load.

      This is actually still a problem to me, please, if you have any hints for me to get this fixed tell it to me. I am desperate for every single bit of information.

       

      Lessons Learned 6#: Not only the heat can cause serious trouble in crossfire. Not every game is optimiced for mGPU, this is no secret knowledge. In this case the second Card just idles around. But even if its supportet i ran into a bunch of problems which i couldnt finally understand by now. When the resolution in a game changes, for example for a cutscene, it is possible that one card in crossfire can get out of sync. You will then see and learn some courious effects of rendering pictures on different GPUs. For example: both cards are creating a output of 30fps, together its 60. One picture will come from Card A, the second one from Card B, the next from A again and so on. Its a stream of ABABABABABAB....

      If one card gets out of sync it will not send their pictures anymore, you will then see AAAAAAAAA or BBBBBBBB. But the cards are just meant to create every single picture, so you see a incredible lag in not that unconfortable 30fps. This will cause the computer to crash at some point.

      AMD Support: please if you have some hints or information about this, tell us!

       

      Lessons Learned 7#: Different Crossfire Modes can cause the computer / game to crash. I am not that familiar with the different crossfire modes, i read some wiki articles about them, but basically what i get is "when you run a game in 1x1 mode, it will maybe run, or maybe crash the PC entirely".

      AMD Support: please if you have some hints or information about this, tell us!

       

       

      Summary

      I went to a long suffering, and for now it was not worth the trouble. I still have problems with random crashs even with crossfire deactivated running only FullHD.

       

       

      My open Questions:

       

      1) Is there a problem with crossfire manipulating the GPUs Powerstates? Is it preventing the GPU to throttle the power and cooling down? I put the fan regulation to max 4000rpm and it still crashs the computer sometimes by reaching 85 degrees. How can i prevent the card to run into its own power-heat-death?

       

      2) My Drivers are thanks to "Adrenalin" (V18.3.1) everytime up to date. I recogniced the available beta driver is disabling crossfire, so it moved back to the newest main release. Is it possible that some crossfire modes can crash the whole pc? If yes can you please give me some help here?

       

      3) How can i get the Game Profiles working? Its most of the time 50/50 if a profile is loaded at the start of a game or not. Also the global WattMan settings will not survive a computer restart. How can i get this working? I would love to allow the fans global to run up to 4000rpm when its needed. Even i would like to have the cards on permanent energy safe mode for testing. Thats not working either after a restart. I dont really now if it even works directly after.

       

      Guys, tell me your mGPU / Crossfire / Vega 64 problems which could have something in common with my suffering. I am really thankful for every bit of information. Also please tell me how you fixed it.

       

      Thankful

      Eric