cancel
Showing results for 
Search instead for 
Did you mean: 

PC Processors

RumKasato
Journeyman III

5800x WHEA_UNCORRECTABLE_ERROR (124)

Hello all,

I recently switched out my 2700x + B450 Tomahawk Max for a 5800x and B550 Tomahawk Max (I realize I didn't need a board upgrade but I was really keen on the "new" Tomahawk) and shortly thereafter my entire pc started crashing with the aforementioned WHEA error 124 in certain situations.

 

My system specs:

Ryzen 7 5800x

B550 Tomahawk Max

2x 8GB 3200MHz CL16 Fury Renegade (KF432C16RBK2/16)

Kingston SNV2S/500GB (boot drive)

Sapphire Pulse RX 6700 XT

Corsair RM750x

 

After I was done moving everything, the pc looked like it was working fine and I managed to play quite a bit of War Thunder and Battlefield 1 without a hitch.

That changed after I turned to GTA Online with my friends. My pc survived maybe about an hour and a half before the image froze and audio started endlessly buzzing. The pc was completely unresponsive and I had to force shut it down using the power button. This didn't trigger any kind of BSOD that I could see and didn't generate a dump file when it first happened, I only found three errors logged in Event Viewer mentioning cache hierarchy, bus interconnect, and memory (Processor APIC ID was always 0).

This was very odd to me, since there were no other signs of instability, such as weird stuttering or perhaps corrected WHEA errors (No overclocking or under-volt used apart from an automatic under-volt on the GPU, which was disabled after the first crash).

Going forward I managed to make the pc crash a few more times playing GTA and Tomb Raider (2013), generally after an hour to something less than two hours of gameplay. Nothing else at this time seems to cause crashing (I tried CS2, War Thunder, Battlefield 1, Battlefield 5, TF2.... Also, other kinds of software seemed stable, like Chrome or Discord).

The exact behavior of the crashing changed after the initial one. After getting another event 18 (bus/interconnect), all subsequent errors are now tied to event 46 (memory), all crashes cause the pc to freeze, buzz, cause part of the screen to black out (whole bottom section, for example) and finally cause the whole computer to restart. Also, I am now getting memory dumps generated, two of which survived because I foolishly though I fixed the problem at one point and deleted the rest.

 

Failed troubleshooting attempts:

- Updated the bios (AGESA 1.2.0.B --> 1.2.0.C)

- Checked temperatures (Everything seems to be fine including, i guess, the 90°C on the CPU)

- Reduced CPU temps by making the fan curves more aggressive

- Changed XMP profiles (Both profiles appear to be identical? Even after checking timings and voltages)

- Used 65W ECO Mode (Doesn't change single core behavior in most games so......)

- Downgraded GPU driver to previous version (At the time, this was 24.4.1)

- Reinstalled chipset driver

- Updated multiple drivers (Using latest 24.6.1 GPU driver and the latest chipset driver at the time of writing)

- Removed the automatic GPU under-volt in Adrenalin

- Stress tested a few things using OCCT and Memtest86+ (And of course it didn't trigger a crash)

- Tried only light use to check if that would trigger a crash (Writing all of this on the troubled machine), which it didn't

 

Planning on reinstalling Windows 11 (since I hadn't done that after changing the parts) and probably RMA the CPU and perhaps motherboard if all else fails.

I tried analyzing the dump files myself but didn't see anything useful. Probably because that's how these things usually go, or (also likely) I'm trash at reading them.....

I really want to avoid an RMA process of any kind since that would probably leave me without a pc for some time. Greatly looking forward to any kind of help with this.

0 Likes
16 Replies
johnnyenglish
Big Boss

Hi i would run without XMP/DOCP for troubleshooting.

 

Good luck 

The Englishman
0 Likes

I don't know why I never bothered trying that before, but I'll definitely give it a go and report back.

0 Likes

Just had a 3 hour session playing Tomb Raider after disabling XMP and the system didn't crash.

Before the change, the game caused a crash the last 2 sessions with one being about 1.5h long and the other about an hour long, so it's not a bad start.

Although, I had several situations where GTA refused to crash no matter how long I was sitting in game (4h I managed at the most) so it's possible that I just got lucky this time and I won't be making any assumptions yet.

0 Likes

Okay, scratch that, pc crashed again. This time it's GTA after 2.5h with XMP turned off. Got a proper BSOD now though, almost as if the pc was a tiny bit more stable.

0 Likes
ThreeDee
Paragon

What cooler are you using on your 5800x?

 

90c when under load or idle, lol?

 

Are you using any cable extenders INSIDE your case?

 

Check for firmware updates for  your SSD(s)

 

Have you tried a different power supply?

 

What slots is your RAM installed in? should be A2/B2 , 2nd and 4th slots away from the CPU socket

 

Try reseating your CPU and check for bent pins

 

Generally good practice to run separate power cables from your power supply to each power input on your GPU

 

Install latest BIOS (I think it's a "beta" ) and see if it helps

 

I was getting driver timeouts and a few WHEA errors on my wife's B550m Phantom Gaming 4 setup with a 3700x, RX550 2GB, 2 x 16GB 3200 and a Thermaltake Smart Series RGB 500wtt 80 Plus PSU .. Changed out the PSU to a Segotep 600wtt 80+ Gold and issues went away .. something to test if  you can. The Coarsair PSU you have is a pretty decent one .. but stuff goes bad so who knows.


ThreeDee PC specs
0 Likes

Quite a lengthy reply (I don't mind!) so I'll respond in parts.

1. I'm using a Noctua NH-U12S redux, brand new.

2. 90 degrees under load, of course. Sometimes the temps spike a bit higher up but I made sure it wasn't the temps causing any crashing by changing the fan curves and even using eco mode if necessary. Now the temps don't really go any higher than 80 degrees, so well under maximum. The crashes continued despite this.

3. No, none.

4. Already updated the firmware on my boot drive, I guess I failed to mention this specifically. No change it seems.

5. I don't have any other power supply at hand ,unfortunately.

6. I installed the ram kit exactly as the manual (and motherboard silkscreen) stated, which is A2/B2.

7. Haven't tried this, but the CPU dropped into the socket flush while I was installing it, no force applied to it at all. I really don't think there will be any sort of physical damage on the CPU if it does end up being the culprit.

8. I heard of other people having mysterious issues traced back to daisy chained PCIe cables, so I'll connect up another cable, see if it maybe fixes the problem.

9. It will be pretty concerning if I end up requiring a really recent beta bios from MSI to fix this, but it's a simple job so I will for sure try it out

Thanks for the help with this, I'll write back as soon as I go through some troubleshooting, though that could take a while since the crashing isn't 100% consistent and that might mean hours of gameplay before something happens....or doesn't happen....hopefully.

0 Likes
FunkZ
Grandmaster

What cooler?

90°C while technically within spec is still hot. Look into Curve Optimizer.

Ryzen R7 5700X | B550 Gaming X | 2x16GB G.Skill 3600 | Radeon RX 7900XT
Ryzen R7 5700G | B550 Gaming X | 2x8GB G.Skill 4000 | Radeon Vega 8 IGP
Ryzen R5 5600 | B550 Gaming Edge | 4x8GB G.Skill 3600 | Radeon RX 6800XT
0 Likes

Noctua NH-U12S redux, as stated above.

I heard of people fixing WHEA errors through Curve Optimizer, I'll have to learn how to use it first though.

If I end up needing to fiddle around with the voltages just to get the CPU stable at essentially stock, I think I won't be keeping it for much longer before I RMA it.

It's crazy these errors are still popping up for new Zen 3 users, I though AMD had these things well under control by now, it's been 4 years after all......

0 Likes

Since adjusting the CPU settings to reduce temps below 80°C are the crashes still only happening with certain games after extended periods of play? What does the hotspot of that Sapphire 6700XT get up to?

Ryzen R7 5700X | B550 Gaming X | 2x16GB G.Skill 3600 | Radeon RX 7900XT
Ryzen R7 5700G | B550 Gaming X | 2x8GB G.Skill 4000 | Radeon Vega 8 IGP
Ryzen R5 5600 | B550 Gaming Edge | 4x8GB G.Skill 3600 | Radeon RX 6800XT
0 Likes
RumKasato
Journeyman III

Yes, it seems like it didn't make any difference.

The GPU's hotspot never exceeded 102°C (2°C rise since I added in a wireless card below it), which is safe according to AMD. I only got temps like that in way more demanding games, GTA and Tomb Raider are not especially taxing for a 6700 XT although I can't remember which GPU temps they were running at.

This is the same GPU I used before with the 2700x and it was completely stable (as stable as Radeon drivers get), although I never did play GTA or Tomb Raider on that setup....so I can't be certain it isn't the GPU causing all this.

0 Likes

Understand only the CPU and motherboard were changed, my thought was now that you've got a faster CPU and PCIe 4.0 instead of 3.0 the system is no longer a bottleneck and the GPU may be working to its full potential? (ie. harder?)

I've seen complaints from some other partner model cards where the temperature delta between GPU and Hotspot can be 30°C or more which seems excessive. Have you tried adjusting the GPU fan curves at all?

Ryzen R7 5700X | B550 Gaming X | 2x16GB G.Skill 3600 | Radeon RX 7900XT
Ryzen R7 5700G | B550 Gaming X | 2x8GB G.Skill 4000 | Radeon Vega 8 IGP
Ryzen R5 5600 | B550 Gaming Edge | 4x8GB G.Skill 3600 | Radeon RX 6800XT
0 Likes

Fair comment about the CPU upgrade, although I already played a few games with the 2700x that could max out the 6700 XT's power budget.

Good example would be Metro Exodus, where the GPU was sitting at ~200W power draw and 102°C.

Also I don't think PCIe 4.0 has much to offer in this case. The 6700 XT has a x16 (3.0 or 4.0) link which should be more than enough throughput no matter which setting you choose. Perhaps there is some kind of bug happening with this instead? I'll try running things with a 3.0 link.


@FunkZ wrote:

I've seen complaints from some other partner model cards where the temperature delta between GPU and Hotspot can be 30°C or more which seems excessive. Have you tried adjusting the GPU fan curves at all?


There are some conflicting reports about hotspot deltas, probably because it can vary a lot between games. Sometimes you'll get a 10°C delta, and sometimes it's a 30°C delta.

I never found this unusual in the case of RDNA 2 cards, seeing as their die edge seems to be mostly infinity cache which shouldn't really approach CU temps nearer to the center of the die, but what do I know about that.

0 Likes
johnnyenglish
Big Boss

Since XMP was a bust and @FunkZ pointed out a good topic on the GPU, I would try setting a power limit on the 6700XT at 80% or 90%

 

Even a low hard cap on fps would be good to see how it goes. 

 

I know this is not the best way to game, but again, its just troubleshooting.

 

 

The Englishman
0 Likes

I'm not sure if this counts, seeing as the GPU can still maybe spike in power, but all of my games right now are running with a 120 FPS limit that I set up using 'Frame rate target control' in Adrenalin (GTA still crashing without any limits imposed).

This is enough for software to report a much lower maximum power consumption (below 80%), unless I'm running more demanding games, which never seem to crash.

0 Likes
redmadmax
Journeyman III

Windows Security -> Deivice Security -> Core Isolation Details -> DISBALE / ENABLE Memory Integrity

0 Likes
RumKasato
Journeyman III

Hello all, I apologize for the 2 months or so of radio silence, I had some other things to attend to so I didn't have the time or patience to continue troubleshooting this situation until I got back to it a few days ago.

I decided to simply use Occam's razor and replace the CPU, since that is most likely the culprit. Lo and behold, my PC seems to be stable now.

I only tested GTA Online as of now but I pushed through several 3+ hour sessions without issue, which wasn't really possible before. I'm fairly certain the issue is solved now. I guess I'll try to report back if anything interesting happens later on but I doubt it.

There's just one final thing to note about this. My previous 5800X was showing for some reason 1.45V in BIOS while using all stock settings and the reading was persisting through a BIOS flash to a newer version. I though this was normal at the time. This behavior only disappeared while using eco mode (which didn't fix the stability issue), where the reading was showing ~1.35V, which sounds like a sane value and agrees with other people's readings. Ever since I replaced the CPU, the BIOS was always showing a correct value of around 1.33-35V.

Anyway, thanks everyone for your replies and advice, hopefully this is now the end of it.

0 Likes