cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

ghueller
Journeyman III
Journeyman III

Linux on 3700x: spontaneous reboots caued by MCE

Hi,

I am running Linux (Fedora 31) on my build from last July, consisting of:
- Crucial DDR4 3000 Sticks
- Radeon RX 570 (MSI)
- Asrock Phantom Gaming 4 (latest BIOS)
- Ryzen 3700x

The system is fast and - at least under windows 10 running fine.
Temps are ok, PSU is of high quality, memory sustains yours of memtest86 witout errors.

Yet, when running Linux, I get a short freeze followed by a reboot about once a week.
At the next boot, the following machine check exception is logged:

[    0.707393] mce: [Hardware Error]: Machine check events logged
[    0.707395] mce: [Hardware Error]: CPU 10: Machine Check: 0 Bank 5: bea0000000000108
[    0.707464] mce: [Hardware Error]: TSC 0 ADDR 1ffffbb03343c MISC d012000100000000 SYND 4d000000 IPID 500b000000000
[    0.707540] mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1583508288 SOCKET 0 APIC 5 microcode 8701013
[    0.709397] mce: [Hardware Error]: Machine check events logged
[    0.709398] mce: [Hardware Error]: CPU 12: Machine Check: 0 Bank 5: bea0000000000108
[    0.709468] mce: [Hardware Error]: TSC 0 ADDR 1ffffbba3a05a MISC d012000100000000 SYND 4d000000 IPID 500b000000000
[    0.709543] mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1583508288 SOCKET 0 APIC 9 microcode 8701013


AMD support more or less aborts any communication as soon as they read over the term "linux".
Any idea how to diagnose this issue any further?

Thank you in advance, Gerhard

Tags (3)
0 Kudos
2 Replies
ghueller
Journeyman III
Journeyman III

Re: Linux on 3700x: spontaneous reboots caued by MCE

could please someone from AMD have a look at this issue.

Just had anouther one five minutes ago:

Mär 19 08:22:35 localhost.localdomain kernel: mce: [Hardware Error]: Machine check events logged
Mär 19 08:22:35 localhost.localdomain kernel: mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 5: bea0000000000108
Mär 19 08:22:35 localhost.localdomain kernel: mce: [Hardware Error]: TSC 0 ADDR 7fd8b0e13c9e MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Mär 19 08:22:35 localhost.localdomain kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1584602553 SOCKET 0 APIC 6 microcode 8701013

0 Kudos
Peter3
Journeyman III
Journeyman III

Re: Linux on 3700x: spontaneous reboots caued by MCE

The bea0000000000108 and microcode 8701013 may be solvable by booting with amdgpu.ppfeaturemask=0xffffbffd. See https://bugzilla.kernel.org/show_bug.cgi?id=206903#c135
0 Kudos