AnsweredAssumed Answered

Ryzen 2700x Hardware errors Linux

Question asked by xblack on Jul 30, 2018
Latest reply on Sep 17, 2019 by bambovc

Hi,

 

I have a Ryzen 2700x on a gigabyte AURUS x470 Gaming Ultra with CMK32GX4M2B3000C15 ram on bank 1 and 2.

 

I currently get this error from now and then with some sporadic full system freeze that require a manual reset.

Jul 27 09:27:09 marcopc kernel: mce: [Hardware Error]: Machine check events logged
Jul 27 09:27:09 marcopc kernel: mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 3: 9820000000000150
Jul 27 09:27:09 marcopc kernel: mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 2a000503 IPID 300b000000000
Jul 27 09:27:09 marcopc kernel: mce: [Hardware Error]: PROCESSOR 2:800f82 TIME 1532676424 SOCKET 0 APIC 3 microcode 8008206

They are generated especially doing cross-compilation or using the ryzen-test utility (https://github.com/suaefar/ryzen-test)

[  941.702984] mce: [Hardware Error]: Machine check events logged
[  941.702988] [Hardware Error]: Corrected error, no action required.
[  941.702994] [Hardware Error]: CPU:3 (17:8:2) MC3_STATUS[-|CE|MiscV|-|-|-|-|SyndV|-]: 0x9820000000000150
[  941.702999] [Hardware Error]: IPID: 0x000300b000000000, Syndrome: 0x000000002a000503
[  941.703002] [Hardware Error]: Decode Unit Extended Error Code: 0
[  941.703004] [Hardware Error]: Decode Unit Error: uop cache tag parity error.
[  941.703006] [Hardware Error]: cache level: RESV, tx: INSN, mem-tx: IRD
[ 1255.717410] mce: [Hardware Error]: Machine check events logged
[ 1255.717413] [Hardware Error]: Corrected error, no action required.
[ 1255.717418] [Hardware Error]: CPU:5 (17:8:2) MC3_STATUS[-|CE|MiscV|-|-|-|-|SyndV|-]: 0x9820000000000150
[ 1255.717422] [Hardware Error]: IPID: 0x000300b000000000, Syndrome: 0x000000002a000503
[ 1255.717424] [Hardware Error]: Decode Unit Extended Error Code: 0
[ 1255.717425] [Hardware Error]: Decode Unit Error: uop cache tag parity error.
[ 1255.717427] [Hardware Error]: cache level: RESV, tx: INSN, mem-tx: IRD
[ 1255.717430] mce: [Hardware Error]: Machine check events logged
[ 1255.717430] [Hardware Error]: Corrected error, no action required.
[ 1255.717432] [Hardware Error]: CPU:13 (17:8:2) MC3_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd820000000000150
[ 1255.717434] [Hardware Error]: IPID: 0x000300b000000000, Syndrome: 0x000000002a000503
[ 1255.717436] [Hardware Error]: Decode Unit Extended Error Code: 0
[ 1255.717437] [Hardware Error]: Decode Unit Error: uop cache tag parity error.
[ 1255.717438] [Hardware Error]: cache level: RESV, tx: INSN, mem-tx: IRD

Most of the time it doesn't generate particular issues and correct itself as you see while sometime one of the processes get tainted or even the pc freeze completely

The current system is ArchLinux with kernel 4.17.9-1 and cpu microcode patch_level=0x08008206.

 

Is anyone else having this issue?

Did a cpu change solve the issue?

 

I asked AMD but as result after my first email with the information above I got an RMA approval with no other info.

 

Thanks

 

Marco

Outcomes