6 Replies Latest reply on Aug 7, 2016 9:21 AM by hedaurabesh

    RX480 crash in games, no crash in Burn-In Benchmarks (Furmark)

    hedaurabesh

      Hi everyone.

       

      I have a very curious problem with my Radeon RX480. Actually, the same problem with two separate cards (Sapphire and Asus, both the reference model).

       

      First, the specs of my system:

      • ASUS Radeon RX 480 8GB - Reference model (Same problem exists with Sapphire version)
      • Phenom II X6 1090T Black Edition @ stock speeds, undervolted to 1.3V
      • ASRock K10N780SLIX3-WiFi
      • 8GB DDR2-800 5-5-5-18 RAM (Corsair)
      • Tagan TG600-BZ Piperock 600W Power Supply
      • Monitor connected via DisplayPort cable
      • Windows 10 Pro

       

      Now on to the weird stuff. The system is usually rock stable. With my previous card -- a Sapphire Toxic R9 270x -- the system runs crash-free, no matter what I throw at it.

       

      Unfortunately, when I install the RX480, the system seems absolutely stable at first -- but the moment I start a game (Overwatch, DotA 2, Hearthstone, Civilization V), the system has between 10 and 60 Minutes, until it crashes. I experience first a black screen with the sound continuing, followed in 5-10 seconds with the entire PC hard-crashing.

       

      The same thing happens when I watch HTML5 hardware accelerated Web Video (Youtube, Twitch, etc.). First the screen turns black, then 5-10 seconds the sound stops and the OS crashes and 10 seconds after that, the PC reboots.

       

      But when I run Furmark, or stress the CPU & GPU with OpenCL tasks via BOINC (mostly Seti@Home), the system is absolutely stable. Even after 4 hours of Furmark burn or OpenCL torture, the system shows no crashes.

      Similar when I just keep the system running in idle.

       

      I also ran Memtest86+ to verify that there are no RAM issues. Even after 12 hours of RAM checks, no errors are reported.

       

      I have cleaned the driver install and reinstalled them from scratch.

       

      What could it be? I doubt it is the PSU, since the R9 runs stable (and uses 2 6-pin connectors over the same physical cable). For the same reason, I also doubt that it's the rest of the hardware. Additionally, 2 RX480 cards show the same behaviour on that system.

       

      As for the temperatures, the GPU fan tops out at about 50-60% load and keeps the card at 80-90°C -- the R9 was 60-70°C in the same hull. Even playing with an open chassis does not prevent the crash.

       

      If you need sensor data from the card, I have attached a dump taken with GPU-Z, that was running while I watched a DotA2 match via SourceTV inside the game. The second-to-last line is the last "in-game" measurement. the last one is when the crash is underway (i.e. the screen is already black).

       

       

      Thanks a bunch!

        • Re: RX480 crash in games, no crash in Burn-In Benchmarks (Furmark)
          black_zion

          God what a nasty power supply, 4 12v rails at 20A each? I'd replace it and use that piece of garbage for target practice. The drivers detect when power viruses are running, like Furmark, and throttle it back.

          1 of 1 people found this helpful
            • Re: RX480 crash in games, no crash in Burn-In Benchmarks (Furmark)
              hedaurabesh

              I suspected my power supply first, too, but this is why I used GPU-Z.

               

              In Furmark (note I also tried TessMark, GiMark and Pixmark), the card draws consistently over 110 Watts, whereas in most games it uses only about 60W of draw -- as you can see from the attached log of a crash.

               

              Additionally, not trusting Furmark, I also tortured it with CPU and OpenCL computations (via Boinc) that peg the CPU and GPU at 100% utilization (albeit at minimal Shader and Texture Unit usage) and lead to a power-draw of 80-90W on the GPU and about 150W on the CPU.

               

              And then, do not forget that I stably ran an R9 270x both before and after, whose powerdraw was 110W in games and 145W under Furmark load.

              Also, the PC is connected to a wall-wart Watt-Meter, which shows less than 260W overall draw even under full CPU+GPU load.

               

               

              All this leads me to conclude that the PSU, despite its age, is not the culprit, given that it has seen much worse, with no crashes and no significant voltage jitter or drift.

               

               

              P.S.:

              Something to remember: 12V @ 20A = 240W.

              So even though the PSU uses a single, dedicated 12V rail for the GPU, it is still easily 2x overspecced for this card. It seems physically impossible for the RX480 to trip the overcurrent protection. In other words, the cable would catch fire, before you can trip the OCP here. Multirail, Single Rail or not.

              Good read:

            • Re: RX480 crash in games, no crash in Burn-In Benchmarks (Furmark)
              brunosp

              reset bios matheboard remove the batteries

                • Re: RX480 crash in games, no crash in Burn-In Benchmarks (Furmark)
                  hedaurabesh

                  I have already reset the BIOS (as that board does not have EFI).

                   

                  I do admit I forgot to write it in my original posting, given that it is the go-to "check-the-wires-first" solution here, next to the PSU.

                   

                  Anyhow here's what I already did (and did not lead to a solution)

                  • Checked all cables -- they're all good. Verified with other graphics card and multimeter resistance check.
                  • Checked the seating of the card. All pins in contact with board.
                  • Switched PCI-E ports (the board has 3 to choose from)
                  • No nearby sources of overheating or interference. Distance to CPU cooler > 5cm
                  • BIOS fully updated to latest version (well, what counts as "latest" for that 8 year old motherboard. )
                  • BIOS reset to factory defaults -- both via BIOS itself, via power-reset and "Clear CMOS" toggle.
                  • Cleared and reinstalled Windows drivers.
                  • Switched to a different 12V rail on the PSU.
                  • Switched the PSU from 4x20A Multi-Rail to 1x48A Single-Rail mode
                  • Removed all HDDs except the primary SSD.
                  • Stopped undervolting the CPU.
                  • Exchanged cards (from Sapphire version to Asus version)

                   

                  The problem persists. Everything works peachy with the R9 270x, but random crashes occur in games (and HW accelerated YouTube videos) with the RX480.

                   

                  By now, I deeply suspect a driver issue. I will try running on Linux with Steam + DotA2, to see if I can replicate the crashes on a wholly different OS.

                • Re: RX480 crash in games, no crash in Burn-In Benchmarks (Furmark)
                  hedaurabesh

                  Okay, problem is solved.

                   

                  Short Summary

                  • The stock cooler sucks. The voltage regulators are cooled by being attached with thermal sticky gum to the metal cooler frame.
                    • This is obviously not good enough! Even with an open case!
                  • After installing the Arctic Accelero Mono Plus on the GPU and the included aluminium heat sinks to the voltage regulators, the card is absolutely stable now.

                  • Do note though, that I can't recommend this cooler. Installation is very finicky and you need to get super-creative to properly cool 2 out of the 8 VRAM chips, since the cooler itself is too big and blocks access to half of their surface.

                   

                   

                  Full Explanation

                   

                  Here's what I tried since my last post and did not help, in order:

                   

                  1. I installed the latest WHQL signed drivers (16.7.3) -- they seem identical to 16.7.2 except for the signing, so they did not help.
                  2. I booted a Ubuntu Linux from a stick, installed the drivers, Steam and DotA2 and let it run. It was a lot more stable than on Windows, but still occasionally crashed.
                  3. I entered WattMan and increased the fan speed, to keep the temperatures of the GPU at around 75°C (instead of 90°C). This did not help, or if, not much.
                    • Note: This is all done with an open chassis, so fresh air and good airflow are a non-issue.
                  4. I entered WattMan and reduced the maximum VCore voltages in States 6 & 7 down to 1050 and 1060 mV respectively.
                    • This reduced the crashes by a lot, but did not solve them.
                    • Note: I tried this with both "Energy Efficiency" on and off, same result.

                   

                  The last results then got me thinking: Even with open chassis, the card gets pretty hot -- but not equally. There are a number of hotspots and they do not coincide with where the GPU sits on the card.

                   

                  So if reducing the peak voltage helps, but the PSU wattage is not the limiting factor -- maybe it is temperature induced voltage instability. Less mV peak load means less heating of the voltage regulators.

                   

                  Okay, so time to pry off the AMD stock cooling to see why this is. Since I do not mind violating the warranty, I removed the fan and heatsinks*.

                   

                  Shock and horror -- the memory chips and voltage regulators are cooled by the contact to the metal frame of the cooler-housing.

                  But there's no direct contact between the regulators and the metal. Instead, there are sticky pieces of thermal gum attached to the chips/regulators.

                   

                  That can't be good thermal conductivity.

                  I went ahead and bought an Arctic Accelero Mono Plus, which comes with a lot of little aluminium cooling fins and proper thermal glue.

                  This drastically increased the cooling of the regulators. The cooling fins get very hot during gaming (too hot to touch), but this is bound to be a lot better than the thermal gum.

                   

                  After re-assembly, I booted and started Furmark. Temperatures stayed below 75°C and everything is stable.

                   

                  Then I watched 3 hours of DotA2 The International 2016 broadcasts in-game, played Overwatch and a bit of Tomb Raider -- all without crashes.

                  Since then, another 3 days of gaming have passed without crashes.

                   

                  So in summary: If your Stock Radeon RX480 crashes your entire computer -- it might very well be due to insufficient voltage regulator cooling.

                  Lower the peak voltage in WattMan, or install a better cooler, or replace the entire card with the custom-cooled RX480s you can find online nowadays.

                   

                   

                  [*] - Sidenote: It is incredible what having a freely spendable budget for gaming gear, that is not yet exceeded, does to you.

                   

                   

                  Thanks for all the tips, anyway, even if the true solution eventually came out of left-field.