Error:
[Tue Nov 13 14:35:35 2018] Uhhuh. NMI received for unknown reason 21 on CPU 84.
[Tue Nov 13 14:35:35 2018] Do you have a strange power saving mode enabled?
[Tue Nov 13 14:35:35 2018] Dazed and confused, but trying to continue
Hardware:
CPU 2x AMD EPYC 7601 on Supermicro H11DST-B (Version: 1.01) 2123BT-HNC0R
with BIOS Version: 1.1a (Release Date: 10/04/2018)
RAM 2TiB 2666 MHz
Software:
OC Linux 4.15.18-7-pve #1 SMP PVE 4.15.18-27 (Wed, 10 Oct 2018 10:50:11 +0200) x86_64 GNU/Linux (Debian GNU/Linux 9.5 (stretch))
How to reproduce:
# apt install linux-tools-4.15
# dpkg -S $(which perf)
linux-base: /usr/bin/perf
# dmesg -T | tail -f
run in other console:
# perf top
I try to disable nmi_watchdog:
# cat /etc/modprobe.d/nmi-watchdog-blacklist.conf
blacklist iTCO_wdt
blacklist iTCO_vendor_support
# grep 'Command line' /var/log/kern.log
Nov 14 19:13:19 host1 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.18-7-pve root=UUID=dfa0b70c-f4e7-4fc3-85ef-e6ddbb288091 ro quiet pcie_aspm=off
Nov 14 19:30:49 host1 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.18-7-pve root=UUID=dfa0b70c-f4e7-4fc3-85ef-e6ddbb288091 ro quiet pcie_aspm=off nmi_watchdog=0
Nov 14 19:56:51 host1 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.18-7-pve root=UUID=dfa0b70c-f4e7-4fc3-85ef-e6ddbb288091 ro quiet nmi_watchdog=0 pcie_aspm=off
Nov 14 20:38:15 host1 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.18-7-pve root=UUID=dfa0b70c-f4e7-4fc3-85ef-e6ddbb288091 ro quiet nmi_watchdog=0 pcie_aspm=off idle=nomwait
Nov 14 21:10:49 host1 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.18-7-pve root=UUID=dfa0b70c-f4e7-4fc3-85ef-e6ddbb288091 ro quiet nmi_watchdog=0 pcie_aspm=off idle=nomwait
I try to change governon (ondemand to perfrormance):
# for c in {0..127}; do cpufreq-set -g performance -c $c; done
But error still preset (on all 4 nodes in server platform)
Solution:
I disable c-states in BIOS and error is gone.
Note:
I think in linux kernel need add support new NMI of EPYC SoC