I ran a 2950X on an X399 Taichi for a long time, and since upgrading to a 2990WX I've been plagued by hard lockups after 8-48 hours of intense compute.
There is some relation to memory -- different memory and different timings do affect how long it takes to lock up, but no settings have fixed the problem entirely. I've been using Nemix brand Micron chip ECC UDIMMs (that look a lot like rebranded Crucial sticks). 4x 32GB is definitely worse than 4x 16GB (which is the memory I was using with my 2950X with no problems). The best I ever achieved was a few weeks, with memory dialed back to 1833MHz (they're 3600 UDIMMs!), without PBO enabled. Looser timing sometimes help a bit. I've never seen ECC errors or had any reports of errors from memtest.
I've also heard of issues with instability from overheating VRMs. I'm trying an experiment now with more aggressive VRM cooling, but I can't figure out which sensor under Linux reports VRM temps to monitor.
I've got an ML360 TR4 edition which is doing a good job keeping CPU from thermal throttling even with all cores boost.
Any recommendations around memory or vrms or anything else would be appreciated