We recently bought a Ryzen 7 1800x to build a local server which is in our office. Since the beginning, we got random crashes which occur anytime with no specific reason. However, sometimes we are able to get some verbose log but usually, we got garbage in the syslog. I put in attachment our most recent crash with logs. Unfortunately, we are not able to reproduce the but at all. Therefore, we think it is an issue with our CPU since we observed that the cpu #8 had some issues two times. It would be great if you guys would be able to help us with that issue.
Os: Ubuntu 16.04.02
Kernel: Linux 4.11.6-041106-generic #201706170517 SMP Sat Jun 17 09:18:46 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Here is my `lscpu` output:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD Ryzen 7 1800X Eight-Core Processor
Stepping: 1
CPU MHz: 2200.000
CPU max MHz: 3600.0000
CPU min MHz: 2200.0000
BogoMIPS: 7180.28
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic overflow_recov succor smca
It's important to have all system spec. RAM, GPU, MOTHERBOARD, etc etc. More details.
I've read that latest ubuntu version is 17.04. Have you tried that version?
Reading syslog i've noticed that your motherboard bios are 1201...i have the same motherboard and latest official bios is 1501. Install this one and load all default settings. I have this bios running and is fine.
In syslog there is often this error: "watchdog: BUG: soft lockup - CPU#12 stuck for 23s!" It could be a bug with that Linux specific version. If i were you, i would try Ubuntu 17.04
Also for your info, with some Ryzen CPUs there are problems with Linux: perhaps you should read this: gcc segmentation faults on Ryzen / Linux BUT before thinking that you have a Ryzen CPU with those problems, if i were you,
1) i would update BIOS to 1501;
2) i would try latest Ubuntu version with updates;
3) in BIOS everything is at default? For example RAM?
I've been experiencing this same issue.
Debian Testing (kernel 4.12.0-1-amd64)
ASRock X370 Taichi
G.SKILL Ripjaws V Series 32GB (2 x 16GB) 3200 (PC4 25600) F4-3200C14D-32GVK
BIOS revision 3.00 (latest)
I've never overclocked except for testing the SoC Voltage suggestion in the GCC Segfault threat. I had been running the RAM under the XMP profile for 3200 but have since reverted to 2400 stock speed to see if that was the issue. I've turned off SMT as was also suggested in the Segfault thread, but that didn't help. I've reset the CMOS. No change.
My system is lucky to make it 24 hours without locking up. When it locks up, the fan cycles from full speed to something lower and back and the error code shows 00.
This seems to be a common problem under Linux and I've read that there may be an issue with the earliest batches of the R7 chips. I've opened a ticket with AMD a couple of days ago, thought they have not responded to it. I'm thinking of trying to get Newegg to accept the processor, fan, board, and memory and starting over with something that works.
Have you had any luck?
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD Ryzen 7 1800X Eight-Core Processor
Stepping: 1
CPU MHz: 2200.000
CPU max MHz: 3600.0000
CPU min MHz: 2200.0000
BogoMIPS: 7199.90
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic overflow_recov succor smca
# dmidecode 3.1
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Table at 0x000ED300.
Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: American Megatrends Inc.
Version: P3.00
Release Date: 07/10/2017
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 16 MB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 5.12
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: To Be Filled By O.E.M.
Product Name: To Be Filled By O.E.M.
Version: To Be Filled By O.E.M.
Serial Number: To Be Filled By O.E.M.
UUID: 03000200-0400-0500-0006-000700080009
Wake-up Type: Power Switch
SKU Number: To Be Filled By O.E.M.
Family: To Be Filled By O.E.M.
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: ASRock
Product Name: X370 Taichi
Version:
Serial Number: M80-A5007300029
Asset Tag:
Features:
Board is a hosting board
Board is replaceable
Location In Chassis:
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
Handle 0x0003, DMI type 3, 22 bytes
Chassis Information
Manufacturer: To Be Filled By O.E.M.
Type: Desktop
Lock: Not Present
Version: To Be Filled By O.E.M.
Serial Number: To Be Filled By O.E.M.
Asset Tag: To Be Filled By O.E.M.
Boot-up State: Safe
Power Supply State: Safe
Thermal State: Safe
Security Status: None
OEM Information: 0x00000000
Height: Unspecified
Number Of Power Cords: 1
Contained Elements: 0
SKU Number: To be filled by O.E.M.
Handle 0x0004, DMI type 9, 17 bytes
System Slot Information
Designation: PCIE1
Type: x1 PCI Express
Current Usage: Available
Length: Short
ID: 17
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: 0000:0c:00.0
Handle 0x0005, DMI type 9, 17 bytes
System Slot Information
Designation: PCIE2
Type: x16 PCI Express
Current Usage: In Use
Length: Long
ID: 18
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: 0000:0e:00.0
Handle 0x0006, DMI type 9, 17 bytes
System Slot Information
Designation: PCIE3
Type: x8 PCI Express
Current Usage: Available
Length: Long
ID: 19
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: 0000:ff:00.0
Handle 0x0007, DMI type 9, 17 bytes
System Slot Information
Designation: PCIE4
Type: x1 PCI Express
Current Usage: Available
Length: Short
ID: 20
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: 0000:0a:00.0
Handle 0x0008, DMI type 9, 17 bytes
System Slot Information
Designation: PCIE5_M2_2
Type: x4 PCI Express
Current Usage: In Use
Length: Long
ID: 21
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: 0000:0d:00.0
Handle 0x0009, DMI type 9, 17 bytes
System Slot Information
Designation: M2_1
Type: x4 PCI Express
Current Usage: Available
Length: Short
ID: 49
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: 0000:ff:00.0
Handle 0x000A, DMI type 11, 5 bytes
OEM Strings
String 1: To Be Filled By O.E.M.
Handle 0x000B, DMI type 32, 20 bytes
System Boot Information
Status: No errors detected
Handle 0x000C, DMI type 40, 14 bytes
Additional Information 1
Referenced Handle: 0x00a1
Referenced Offset: 0x01
String: MORDOR
Value: 0x00000000
Handle 0x000D, DMI type 18, 23 bytes
32-bit Memory Error Information
Type: OK
Granularity: Unknown
Operation: Unknown
Vendor Syndrome: Unknown
Memory Array Address: Unknown
Device Address: Unknown
Resolution: Unknown
Handle 0x000E, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 64 GB
Error Information Handle: 0x000D
Number Of Devices: 4
Handle 0x000F, DMI type 19, 31 bytes
Memory Array Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x007FFFFFFFF
Range Size: 32 GB
Physical Array Handle: 0x000E
Partition Width: 2
Handle 0x0010, DMI type 7, 19 bytes
Cache Information
Socket Designation: L1 - Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 768 kB
Maximum Size: 768 kB
Supported SRAM Types:
Pipeline Burst
Installed SRAM Type: Pipeline Burst
Speed: 1 ns
Error Correction Type: Multi-bit ECC
System Type: Unified
Associativity: 8-way Set-associative
Handle 0x0011, DMI type 7, 19 bytes
Cache Information
Socket Designation: L2 - Cache
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: Internal
Installed Size: 4096 kB
Maximum Size: 4096 kB
Supported SRAM Types:
Pipeline Burst
Installed SRAM Type: Pipeline Burst
Speed: 1 ns
Error Correction Type: Multi-bit ECC
System Type: Unified
Associativity: 8-way Set-associative
Handle 0x0012, DMI type 7, 19 bytes
Cache Information
Socket Designation: L3 - Cache
Configuration: Enabled, Not Socketed, Level 3
Operational Mode: Write Back
Location: Internal
Installed Size: 16384 kB
Maximum Size: 16384 kB
Supported SRAM Types:
Pipeline Burst
Installed SRAM Type: Pipeline Burst
Speed: 1 ns
Error Correction Type: Multi-bit ECC
System Type: Unified
Associativity: 16-way Set-associative
Handle 0x0013, DMI type 4, 48 bytes
Processor Information
Socket Designation: AM4
Type: Central Processor
Family: Zen
Manufacturer: Advanced Micro Devices, Inc.
ID: 11 0F 80 00 FF FB 8B 17
Signature: Family 23, Model 1, Stepping 1
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
CLFSH (CLFLUSH instruction supported)
MMX (MMX technology supported)
FXSR (FXSAVE and FXSTOR instructions supported)
SSE (Streaming SIMD extensions)
SSE2 (Streaming SIMD extensions 2)
HTT (Multi-threading)
Version: AMD Ryzen 7 1800X Eight-Core Processor
Voltage: 1.4 V
External Clock: 100 MHz
Max Speed: 4100 MHz
Current Speed: 3600 MHz
Status: Populated, Enabled
Upgrade: Socket AM4
L1 Cache Handle: 0x0010
L2 Cache Handle: 0x0011
L3 Cache Handle: 0x0012
Serial Number: Unknown
Asset Tag: Unknown
Part Number: Unknown
Core Count: 8
Core Enabled: 8
Thread Count: 16
Characteristics:
64-bit capable
Multi-Core
Hardware Thread
Execute Protection
Enhanced Virtualization
Power/Performance Control
Handle 0x0014, DMI type 18, 23 bytes
32-bit Memory Error Information
Type: OK
Granularity: Unknown
Operation: Unknown
Vendor Syndrome: Unknown
Memory Array Address: Unknown
Device Address: Unknown
Resolution: Unknown
Handle 0x0015, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x000E
Error Information Handle: 0x0014
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMM 0
Bank Locator: CHANNEL A
Type: Unknown
Type Detail: Unknown
Speed: Unknown
Manufacturer: Unknown
Serial Number: Unknown
Asset Tag: Not Specified
Part Number: Unknown
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0016, DMI type 18, 23 bytes
32-bit Memory Error Information
Type: OK
Granularity: Unknown
Operation: Unknown
Vendor Syndrome: Unknown
Memory Array Address: Unknown
Device Address: Unknown
Resolution: Unknown
Handle 0x0017, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x000E
Error Information Handle: 0x0016
Total Width: 64 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: None
Locator: DIMM 1
Bank Locator: CHANNEL A
Type: DDR4
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 2134 MT/s
Manufacturer: Unknown
Serial Number: 00000000
Asset Tag: Not Specified
Part Number: F4-3200C14-16GTZSW
Rank: 2
Configured Clock Speed: 1067 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Handle 0x0018, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x007FFFFFFFF
Range Size: 32 GB
Physical Device Handle: 0x0017
Memory Array Mapped Address Handle: 0x000F
Partition Row Position: Unknown
Interleave Position: Unknown
Interleaved Data Depth: Unknown
Handle 0x0019, DMI type 18, 23 bytes
32-bit Memory Error Information
Type: OK
Granularity: Unknown
Operation: Unknown
Vendor Syndrome: Unknown
Memory Array Address: Unknown
Device Address: Unknown
Resolution: Unknown
Handle 0x001A, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x000E
Error Information Handle: 0x0019
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMM 0
Bank Locator: CHANNEL B
Type: Unknown
Type Detail: Unknown
Speed: Unknown
Manufacturer: Unknown
Serial Number: Unknown
Asset Tag: Not Specified
Part Number: Unknown
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x001B, DMI type 18, 23 bytes
32-bit Memory Error Information
Type: OK
Granularity: Unknown
Operation: Unknown
Vendor Syndrome: Unknown
Memory Array Address: Unknown
Device Address: Unknown
Resolution: Unknown
Handle 0x001C, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x000E
Error Information Handle: 0x001B
Total Width: 64 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: None
Locator: DIMM 1
Bank Locator: CHANNEL B
Type: DDR4
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 2134 MT/s
Manufacturer: Unknown
Serial Number: 00000000
Asset Tag: Not Specified
Part Number: F4-3200C14-16GTZSW
Rank: 2
Configured Clock Speed: 1067 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Handle 0x001D, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x007FFFFFFFF
Range Size: 32 GB
Physical Device Handle: 0x001C
Memory Array Mapped Address Handle: 0x000F
Partition Row Position: Unknown
Interleave Position: Unknown
Interleaved Data Depth: Unknown
Handle 0x001E, DMI type 127, 4 bytes
End Of Table
We updated our BIOS and everything is at default in the configurations and we still have those bugs/crashes. The only thing we didn't change yet it's the Ubuntu version perhaps it will be our next step.
Hi,
something new on that topic? Have the same problem: Xenserver 7.2 crashes after a while, mainboard shows CPU problem (EZ debug led).
Ryzen 7 1700
MSI B350M VDH-PRO (latest BIOS)
32GB RAM (certified Corsair 2x 16GB)
lscpu:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD Ryzen 7 1700 Eight-Core Processor
Stepping: 1
CPU MHz: 3000.092
BogoMIPS: 6000.18
Hypervisor vendor: Xen
Virtualization type: none
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 16384K
There is a know bug with the AMD Ryzen 7 1800x processors and Linux.
You should be able to open an RMA and they will ship you a different chip.
There is also another bug with the Linux Kernel which has an update also on 4.10 and 4.11
Thank you for your reply!
My kernel is a 4.4.0+10. I don't know what means "+10". Is this version affected then?
EDIT: I did a memtest86 v7.4 memory test - after more than 6 hours I got a lot of errors for test #3 on CPU1. I'm a little uncertain:
- Is memtest86 reliable?
- Is it possible that the CPU was crashing and memtest86 shows me a false error of the RAM therefore (because the communication between CPU and RAM tored off)?
Ok, Xenserver 7.2 is running in idle-mode with my three vm's without crashing. What I did is
- switch off "performance boost" in BIOS (leads to overclocking the CPU from 3 GHz up to 3.7 GHz)
- switch off "AMD cool'n'quiet" in BIOS (seems to be some ACPI routines)
Now the server seems to work. I put it to productive mode on sunday and up to now it looks good.
(The CPU is a Ryzen 7 1700)
There is a Linux Kernel Bug open, see: 196683 – Random Soft Lockup on new Ryzen build
And related thread here: Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
AFAICS, there are three CPU issues affecting Linux systems:
The "SEGV" issue appears to only affect early CPUs. AMD will replace affected CPUs.
The "soft lockup" and "system freeze" issues may be related, but are definitely not related to the "SEGV".
The "soft lockup" issue may be avoided by a combination of Kernel and Kernel-Command-Line options (see under kernel bug 196683). My experience is that this is not sufficient to also avoid the "system freeze" issue.
The discussion under kernel bug 196683 suggests that there is a power-management related issue with the Ryzen CPUs. Whatever the issue is, it seems that disabling the C6 (deep-sleep power-saving) state will help avoid "system freeze". Other voodoo seems to work for some people... though there seems to be a common thread of either avoiding going idle, or otherwise preventing low voltage states (eg: http://www.silence.host/node/1) .
AMD have claimed that a late model AGESA will fix the issue. I cannot testify to that. It has been reported that the new BIOS options will disable C6, for some or for all cores.
Sadly, as far as I know, AMD have (to date) been unable to identify the root cause of the "soft lockup" or "system freeze" issues.