anyoldname3

Navi 5700 issues (despite every piece of hardware and software, including the card, being replaced)

Discussion created by anyoldname3 on Aug 10, 2019
Latest reply on Oct 9, 2019 by lenunu

A short while ago, we bought a Radeon RX 5700 (Sapphire's reference card) but quickly discovered some issues, which the person using the card describes as:

1) screen goes black for between ~1 and ~10 seconds, then works (causes war thunder to reduce texture quality with the message Video driver hung and was restarted. Texture quality was reduced.)
2) half of main screen changes colour as if it stops outputting any green light and/or goes darker, sometimes stops after a few seconds, sometimes requries me to move windows around and change focus
3) War Thunder crashes, sometimes manages to crash and allows me to send a crash log to Gaijin, once the whole computer froze and required me ot use the restart button on the case
4) Bluescreened while loading World of Warships with an error message to do with graphics
5) models of things don't render correctly in war thunder
6) KSP crashed once while attempting to load, although i've got it to load to main menu since without issue
7) sky and ground (when far away) went black in war thunder (fixed by turning shadows to minimum)
8) sun looks weird (purple, pixelly and sparkly in war thunder)
9) steam overlay framerate counter goes weird

 

The previous card was a heavily factory-overclocked 4GB GTX 770 and the Nvidia driver was removed using an up-to-date version of Guru 3D's DDU tool, and before that, he had an R9 285.

 

The rest of the computer was as follows:

  • i5 4690K at stock frequencies.
  • 12 GB of DDR3 memory.
  • Gigabyte Z97P-D3 motherboard with somewhat outdated BIOS.
  • A 500 GB boot SSD and two 1TB HDDs.
  • A Corsair CX 430M PSU.
  • Windows 10 1809 (because Windows didn't want to automatically install 1903 until we started making major hardware changes).

 

Initially, we suspected that it was just the Navi drivers being immature, so we updated from 19.7.4 to 19.7.5. This didn't help.

 

We googled the issues and couldn't find anything relevant, so began to suspect that it was unique to this machine, and potentially because the card was defective (as it's very unlikely that no one has played any of these games on a Navi card before).

 

After exchanging the card (the store did their own testing, but with completely different software, and eventually managed to produce one potentially related issue), the issues persisted, so the lacklustre power supply was the next suspect. It had coped fine with two previous graphics cards, one of which had a much higher TDP than the 5700 and one of which was the same or about the same, but we thought maybe they'd just had extra headroom in their specification, so weren't drawing their full TDP.

 

Despite trying two different, significantly better power supplies, nothing improved. We also tried a 50% power limit in Wattman just in case, and that didn't help either.

 

At some point, we updated the motherboard BIOS, which worked temporarily, then the issues resumed. We also tried running the same tests as we'd seen the store we bought the card from do without issue, and none of them had any symptoms.

 

Just as a sanity-check, we tried my Vega 56 in the computer, and that worked fine.

 

Now there were two fairly thoroughly dismantled machines in the same room, we tried other tests, including using the original computer's boot drive and the Navi card on my computer's motherboard (an MSI Z87-G45 GAMING with a 4670K), which temporarily avoided the issues before they came back.

 

We then remembered that sometimes Windows Update likes leaving computers on 1809 despite 1903 being available for several months, so (with the original SSD and 5700 now connected to the MSI motherboard and EVGA PSU), we made it install Windows 10 1903. Like every other major change, the issues were gone until a reboot occurred.

 

Next came a reinstall of the drivers (using DDU to remove them and installing them via the 19.7.5 installer instead of Windows Update). Due to Steam deciding War Thunder wasn't installed and so reinstalling it, we couldn't test things before rebooting a couple of times, but afterwards, it worked at least twice, so we left it for the night, hoping that the issue had been chased away and the card and its intended computer could be reunited.

 

The next day, I had the person who should actually be using the computer test the issues many times. (To recap, this was with the MSI motherboard, reliable high-end PSU, post-RMA RX 5700, original boot drive updated to Windows 10 1903, and freshly reinstalled 19.7.5 drivers and War Thunder.) He said he couldn't reproduce any of the issues any more, no matter how many reboots he tried or how long he used the system for, except for issue 2, which happened rarely, but was no longer restricted to the main screen (although the DisplayPort cables might have ended up in different slots, so it might still be restricted to one of the card's outputs). He decided this was minor enough to live with and blame on immature drivers, so did the same testing on his Gigabyte motherboard and got the same results. It looked like victory was at hand.

 

As I didn't want to permanently give him my power supply, which I need as my system draws more power, and he didn't want to spend more money than necessary, we did a burn test with HeavyLoad for CPU and memory, and Furmark for the GPU using the 430 W Corsair PSU. It was still running after an hour, so we were feeling confident, but then I saw the screen flick off and back on like it had been doing in War Thunder, and Task Manager logged this as a brief drop to 0% GPU usage. This was potentially issue 1 happening in another application, which, if so, would mean it wasn't just a War Thunder incompatibility with Navi, however, I'm no longer confident that it wasn't just that I accidentally clicked into another window and caused the focus to change. We ended the test, but decided to test War Thunder again while the system was hot, and had issue 1 again.

 

The next day, we tried the same test again but with the best power supply in case it was just a power draw issue. The flicker didn't happen during the first hour, but did happen when I clicked out of Furmark, which is why I'm not confident this isn't what happened before. However, when starting War Thunder with the computer still hot, the screens flickered off for  short while, and when they came back, War Thunder's loading screen had major graphical artefacts for a few seconds (which I didn't manage to screenshot, and I didn't see if the reducing texture quality message that shows up with issue 1 appeared) and Task Manager on the other screen turned black, although every part of it that gets redrawn went back to normal upon being redrawn. I thought to run the Problem Report Wizard after closing War Thunder, so have that XML file if it's needed.

 

 

 

 

The next thing left to test was the 5700 in my computer with my usual OS, so that's what I did. After reassembling the machine, installing drivers and War Thunder, and rebooting things, I started War Thunder. One the game left the initial loading screen and reached the menu, I encountered issues 1 and 5, just as had been happening on the other computer. I've attached a screenshot and a problem report XML (without the stylesheet as I assume that's unnecessary). I didn't encounter a crash while loading Kerbal Space Program over several restarts. When I enabled it, I did see artefacts on the Steam FPS counter in War Thunder (and then issue 1 happened so badly that the screens didn't come back until I restarted the system with the reset button). I didn't see issue 2 over a couple of hours, but it was happening less frequently than that on the other system, so I'd count that as inconclusive. I didn't test World of Warships, as I'd rather not bluescreen my own computer. That does mean, though, that I only encountered issues on my machine with War Thunder.

 

 

The last test I could think of was to try a completely fresh Windows install. I found a spare hard drive, wiped it, and put it in a computer with the Navi card and no other drives. I installed Windows 10 1903, and any Windows updates that were available, and then installed the graphics drivers, rebooted, and installed Steam and War Thunder. The first time I ran War Thunder, I encountered issue 1 and then issue 3. The second time, there was a blue screen of death almost immediately.

 

 

I have a few theories as to what's going on:

  • The replacement card just happens to have the same defect as the original, so things didn't get better when it was swapped.
  • War Thunder is doing something bad like relying on undefined behaviour that does one thing on Tonga, Kepler and Vega, and something else on Navi, and the driver doesn't handle erroneous usage very gracefully. The non-War Thunder issues were temporary and have gone now. This doesn't explain why we can reproduce the issues on two different machines, but can't find anyone else reporting things when googling the issue.
  • Navi's drivers are doing something incorrectly that causes issues, but the more mature drivers for the other cards we've tried don't have the same problem. This doesn't explain why we can reproduce the issues on two different machines, but can't find anyone else reporting things when googling the issue.

 


Whatever the cause, something needs to be done. If the issues can't be reproduced on another, similar machine with the same model of card, then that implies the second card is broken, too, and we'll need another replacement. If they can be reproduced, then the software needs fixing (as even if it is because War Thunder is doing something incorrectly, the graphics driver shouldn't turn it into a BSOD).

Outcomes