Hello everyone
I am facing an issue with my server after installing a Linux operating system (Ubuntu or CentOS). I have provided the hardware configuration details below:
- CPU: EPYC 7R32
- Motherboard: MZ72-HB0
- Memory: DDR4 32GB ECC 2666
- SSD: Samsung PM9A12TBm2
- Cooling: AMD SP3 custom cooling
- Power Supply: Haiyun 1000W
However, I am encountering intermittent hardware errors on my server. Here are some of the error messages that appear in the system logs:
```
Message from syslogd@server3 at May 31 10:49:44 ... kernel:[Hardware Error]: Corrected error, no action required.
Message from syslogd@server3 at May 31 10:49:44 ... kernel:[Hardware Error]: CPU:0 (17:31:0) MC27_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
Message from syslogd@server3 at May 31 10:49:44 ... kernel:[Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2
Message from syslogd@server3 at May 31 10:49:44 ... kernel:[Hardware Error]: Power, Interrupts, etc. Error: Error on GMI link.
Message from syslogd@server3 at May 31 10:49:44 ... kernel:[Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: GEN, part-proc: SRC (no timeout) ...
```
I would appreciate any assistance or insights into what might be causing these errors and how to resolve them. I am unsure about the nature of the problem and the necessary steps to fix it. Your guidance would be greatly appreciated. Thank you in advance for your help. Best regards