cancel
Showing results for 
Search instead for 
Did you mean: 

EPYC Discussions

walee
Journeyman III

Corrected Hardware Errors popping up on multiple machines

Hi,

We have these errors popping up on multiple of our machines


Oct 20 09:34:08 kernel: mce: [Hardware Error]: Machine check events logged
Oct 20 09:34:08 kernel: [Hardware Error]: Corrected error, no action required.
Oct 20 09:34:08 kernel: [Hardware Error]: CPU:0 (19:1:1) MC27_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd82000000002080b
Oct 20 09:34:08 kernel: [Hardware Error]: PPIN: 0x02b67d5410e580b0
Oct 20 09:34:08 kernel: [Hardware Error]: IPID: 0x0001002e00001e01, Syndrome: 0x000000005a000001
Oct 20 09:34:08 kernel: [Hardware Error]: Power, Interrupts, etc. Ext. Error Code: 2, Link Error.
Oct 20 09:34:09 kernel: [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: GEN, part-proc: SRC (no timeout)
Oct 20 09:39:20 kernel: mce: [Hardware Error]: Machine check events logged
Oct 20 09:39:20 kernel: [Hardware Error]: Corrected error, no action required.
Oct 20 09:39:20 kernel: [Hardware Error]: CPU:0 (19:1:1) MC27_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd82000000002080b
Oct 20 09:39:20 kernel: [Hardware Error]: PPIN: 0x02b67d5410e580b0
Oct 20 09:39:20 kernel: [Hardware Error]: IPID: 0x0001002e00001e01, Syndrome: 0x000000005a000001
Oct 20 09:39:20 kernel: [Hardware Error]: Power, Interrupts, etc. Ext. Error Code: 2, Link Error.
Oct 20 09:39:20 kernel: [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: GEN, part-proc: SRC (no timeout)

 product: AMD EPYC 7763 64-Core Processor
slot: CPU1
size: 3243MHz
capacity: 3529MHz
width: 64 bits
clock: 100MHz

Can someone help me narrow the cause of this issue? Thank you

0 Likes
0 Replies