cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

morenolq
Journeyman III

watchdog: BUG: soft lockup - on Ryzen 9 3950X CPU

Good Morning,

We recently bought a machine equipped with:

CPU AMD Ryzen 9 3950X,
RAM 128GB DDR4 3000MHz,
SSD 1TB + 2xHHD 6TB,
GPU NVIDIA GEFORCE RTX 3090 24GB ,
OS Ubuntu 20.04 LTS,
PSU 850W Certified

We use the machine remotely for doing AI-based research. We had several issues related to an annoying bug when we have a load on CPU. Specifically, the errors are freezing completely the machine and the console returns:

Message from syslogd@machinename at Feb 13 09:37:16 ...
kernel:[ 348.578682] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [systemd-journal:660]

After some print of the above issue, the machine is not reachable in ssh. It is only possible to phisically restart the machine.
We run experiments for weeks on GPU without any problem, once we load the CPU for some tasks, it freezes and report the above issue.

Has anyone experienced the same problem? How can we solve it?

Thank you for your time and support.

4 Replies
Lyliya
Journeyman III

I have the same issue, have you found any fix ?

0 Likes
Xinrui
Journeyman III

Hi,

I have the same issue with 3900x and RTX 3090 GPU. I also use Ubuntu 20.04 LTS and use python numba library for some fast binary search function. Also, A compiled dynamic library with cython also have this issue for me. Have you found any solutions?

0 Likes
kheper
Adept II

Look in the systemd logs for previous errors associated with "watchdog" -

sudo journalctl -p err |grep "watchdog: BUG"

Also, look for which programs triggered the watchdog errors.

In my logs, there are tons of watchdog errors - most of these errors are triggered by Chrome and Firefox. For example -

watchdog: BUG: soft lockup - CPU#11 stuck for 1587s! [Chrome_~dThread:1629]

Only Chrome caused a system freeze, so I do not use it anymore. I suspect that it is a Glibc issue.   

 

Ryzen 5 Pro 4650g, ASRock B550 Phantom Gaming 4/ac, 16GB Crucial Ballistix 3600MHz, SAMSUNG 970 EVO NVMe 500GB, Super Flower Leadex III Gold 850W
0 Likes

In my system I managed to fix the problem by upgrading the bios version. You can find the discussion here https://askubuntu.com/questions/1316623/watchdog-bug-soft-lockup-on-ryzen-9-3950x-cpu

0 Likes