I ran a 2950X on an X399 Taichi for a long time, and since upgrading to a 2990WX I've been plagued by hard lockups after 8-48 hours of intense compute.
There is some relation to memory -- different memory and different timings do affect how long it takes to lock up, but no settings have fixed the problem entirely. I've been using Nemix brand Micron chip ECC UDIMMs (that look a lot like rebranded Crucial sticks). 4x 32GB is definitely worse than 4x 16GB (which is the memory I was using with my 2950X with no problems). The best I ever achieved was a few weeks, with memory dialed back to 1833MHz (they're 3600 UDIMMs!), without PBO enabled. Looser timing sometimes help a bit. I've never seen ECC errors or had any reports of errors from memtest.
I've also heard of issues with instability from overheating VRMs. I'm trying an experiment now with more aggressive VRM cooling, but I can't figure out which sensor under Linux reports VRM temps to monitor.
I've got an ML360 TR4 edition which is doing a good job keeping CPU from thermal throttling even with all cores boost.
Any recommendations around memory or vrms or anything else would be appreciated
As far as I know, the 2nd generation of Threadrippers uses by default random access memory frequency up to 2933MHz. I'm not sure how your BIOS is outlined however if you're running the 2990WX at stock speed then it's just a matter of arranging the memory frequency to 2933MHz. Theoretically, this should work. I would stick to configuration 4x16GB as I'm afraid that both 1st and 2nd generation Threadrippers don't even support 32GB memory modules. The 2990WX generates 250W of TDP, so maybe it's better to also try with a different power supply that has more wattage perhaps. I don't think it has to do anything with the VRM's. Your motherboard is solid. I hope this helps a bit.