I wish I could help you here :(
The "freeze-while-idle" problem I have seen, for which the "Typical Current Idle" BIOS option appears to be a fix (or at least a work around), brings the entire CPU to a complete, dead stop. Once frozen, nothing will kick the CPU back to life short of a hardware reset or power cycle.
What you describe appears different. In particular, you can ssh in to the box and restart stuff.
It seems to me that the root cause of the "freeze-while-idle" I have seen, is some problem with the management of power once the CPU has entered a deep sleep, such that it cannot be re-awoken. Which feels like a deep hardware problem.
A problem which causes streams of "watchdog: BUG: soft lockup" events to be logged has also been seen. It is not clear whether that's related to the "freeze-while-idle" or whether that's a Kernel level problem.
There is talk of "idle=nomwait" being useful. I believe that MWAIT is the way to enter the C6 state, so disabling MWAIT appears to be another way of disabling C6, which doesn't seem to advance the art. I also believe that for virtual machines MWAIT is effectively a call from client OS to the host, and that "idle=nomwait" is not a good idea for client OSes.
Anyway... I am not convinced that the problem you are seeing is the "freeze-while-idle" problem I have seen, and hence it doesn't surprise me that the "Typical Current Idle" BIOS option (and the related C6 and MWAIT voodoo) has not fixed things. Sadly, this probably of little use to you, except, perhaps, to encourage you to look elsewhere for a solution :(
In March of this year AMD advised me that for Ryzen they specify a PSU which can deliver 12V at 0A. Which adds to the feeling that this is a hardware problem. Mind you, having such a PSU did not solve the problem for me. Also, disabling package C6 did not solve the problem. But I have seen no "freeze-while-idle" since setting the "Typical Current Idle" BIOS option.
I have seen conflicting reports as to whether the 2xxx Ryzen suffers in the same way. I have no direct experience of that CPU. (And no urgent desire to buy another AMD, however tempting the 3xxx devices may be.)
I have seen reports of avoiding the problem by tweaking voltages and other overclocking dark arts.
I'm intrigued by the idea that it could be some sort of inrush issue... but in my case, the load when waking up the system would be just enough to wake the screen up and respond to mouse/keyboard, or enough to accept an ssh connection.
For what is is worth, I just installed a new power supply and the latest ASUS Prim X370 Pro Bios (PRIME X370-PRO BIOS 4207)
If there is a need to I am willing to set the bios setting Power Supply Idle Control" back to default and see what happens. At this moment the server is running without any issues since my last posting which was in October (no more system lockups).
The new power supply is a Corsair RM850x.
The server is holding 4 VM's running 24/7.
No overclocking or other tuning as I only need stability.
Yup. It's your motherboard. I had the same problem and i researched for months and then changed the motherboard to Asus TUF B450 PLUS and boom! problem gone. It has nothing to do with Ryzen CPU or linux. My idle freezes would happen on windows and even in BIOS!! A lot of B350/370 ASUS AND ASROCK Mobos have this problem. It's a motherboard fault. You can workaround with some BIOS settings but ultimately its your mobo at fault. Save your time and energy and Either RMA the board under warranty or get a cheap A320 board and see your problem disappear.
I do not have any issues since I set the Bios value Power Supply Idle Control" to "Typical Current Idle". So for me no need to exchange the board.
Trust me, that's a temporary workaround. It fixed mine for some months then re-appeared. It's the board at fault because the VRMs can't switch from low power to high power fast maybe due to faulty capacitor. It's your wish if you want to replace it or not.
Thanks for responding Imshalla.
I tried new BIOS version & BIOS setting [new BIOS made things worse, BIOS setting Typical Current improved abit] & kernel boot parameter (idle=nowait) seems no diff.
The improvement so far I achieved is that it does not freeze while VM servers overnight idle like it consistently did before. But now it will freeze within 1~2 mins if I start to operate at the UI moving mouse etc, mouse cursor will lock up.
From the post below by SKULL on Xmas eve, I gathered that I may be having 2 different types of problems, one is idle current, another could be the sudden surge of threads and power supply voltage dropped - I suspect this was what happened when it freezes while I operated the server GUI. If what SKULL said would apply to me, I suspect that If I adjusted over-clocking voltage up a little may be able to avert some lockups? This is what I would try after X'mas.
For most of my lockup cases, the machine still respond to ssh logins, and can get back to normal after this root bash command:
sudo systemctl restart sddm
Which restarts the KDE Linux desktop manager - Only quite occasionally we had to hit the motherboard reset button.
The only clue found in /var/log/syslog is a kworker thread took more than 120 seconds to respond or something alike.