Hello,
Over a month ago I decided to build a Threadripper computer and have been having some problems. Randomly the computer will stop being able to launch new applications. If I try to restart it will hang on the restart window. Restarting it by hitting the button fixes it. I do a 3D rendering from time to time at home for work and last night the screen went black and I had to pull the plug to be able to start the comp again. It actually sounds a lot like the problem that the guy on this thread is having.
Here are my specs.
Part
Motherboard Gigabyte X399 Aorus Pro
CPU Threadripper 2990WX
RAM 2 sets of 8GB x 4 G.Skill TridentX RGB F4-3200C16Q-32GTZR Quad Channel
Geforce 2080TI
PSU Corsair 1000w HXI 80+P
NVME M.2 Samsung 1tb 970 evo
SSD 2x Samsung 2tb 960 evo
Liquid cooler NZXT Kraken X52 Liquid Cooler
Case Corsair Crystal 570x
Thermal Paste Cooler Master High Performance Thermal Paste
Windows 10 Pro
At first I thought it was a problem with my windows installation but now I’m starting to think that it could be an incompatibility with some of the hardware.
Any help or advice about the hardware or why its crashing would be amazing. I’m running meatiest now and so far nothing is coming up. The next thing I’m going to try is take out one of the sets of 32gb ram.
I just noticed this. I found my ram listed on the mobo approved ram but it looks like there isn’t a check under 8 dim.. and I’m using 2 sets of 4 sticks so I’m using all 8 dims. Is that why its crashing you think?
http://download.gigabyte.us/FileList/Memory/mb_memory_x399-aorus-pro_250W.pdf
Thanks
cgorange, on these kinds of problems my first suspect is the memory and the second is the MB. The steps I suggested above on memory is a starting point. The idea is to determine if it is the memory (and which) or the MB. My first Ryzen (1800X) build required a RMA for the MB due to a bad memory slot. My system would hang, usually with the CPU idle. One of my dual channels was bad and RMA (MB) corrected it. Enjoy, John.
Yeah, I went down to 4 sticks with no OC. I ran memtest and no errors. I think I’ll test 1 stick and swap the psu.. But I think it’s either the mobo or the cpu.
So I put it the one stick of ram and undid all the power connections and plugged them back in to the mobo and psu. Now when I did it I noticed that the main 24pin mobo connector on the psu wasn't in all the way so.. either I didn't have it in all the way originally or I didn't notice when I was taking off the wires that I may have tugged it without noticing.. In any case I've don't a lot of testing and it seems to be stable. (knock on wood).. I'm going to go up to 2 sticks of ram and test it then back to 4 sticks. so we'll see if its an amount of stick problem or if it was that wire problem.
K putting in another stick makes the computer unstable. I’m reading online some people that have had that problem manually set their voltages and one guy set his a little higher to fix it. Has any one had any experience with that?
cgorange, I think it is more likely that the slot (channel) or stick is bad. Try another stick and then another slot till you figure it out. The simpler solution is always the first approach. Enjoy, John.
I’m running a windows memory diag now on the ram. Is it okay to try a random slot? I thought the sticks were supposed to go in a specific slot or is that only if you want dual channel? I’ll try the 2nd stick that i put in as one stick in the original slot to rule out if its a bad stick..
Do you think it could be the psu? The mobo tech guy said to try a diff psu but I really really dont want to rip it out of my old comp to test it.
I ran into this video.. he has a different problem because his ram has boot issues. But he changes the voltages to make it stable.. I wonder if I have to do that
This is so hard to debug. For instance I haven’t changed anything and I’m currently rendering something which is using all my cpu, and i have 4 games of eve online open which are running at 40fps each and I’m playing a video and its all running fine with 2 sticks.
My guess is that you either have a bad RAM stick (or sticks) or one or more of your MB ram slots is bad. You will need to try different combinations to figure this out by swapping sticks and slots. Start with 1 stick only then 2. Test each ram stick and after that start testing by using more than 2 slots. It's a pain though as the possible combinations are quite high.
Ok so, good news and bad news.. good news is my comp seems to be stable now. I haven’t had a crash in a week. The bad news is I’m not exactly sure what the fix was. I noticed a different problem than the instability which I didn’t post here but I was getting a random glitch on the screen when ever I played videos like mp4s or even YouTube. After a lot of testing I figured out that it was because of the latest Nvidia driver. For some reason it causes glitches on one of the monitors when you have more then 1 monitor and if you have a Turing GPU. The fix is to roll back to an older graphics driver.
So maybe this was also the cause of my instability? I dont know.
Other things I did to fix it:
-I reinstalled windows 10 pro (that didn’t fix it since it happened again after that)
-Went down to 1 stick of ram. I’ve gone back up to 4 since and it seems to be stable. I did windows memory diag on the sticks and they call came out fine. I’m still running without an oc on the ram so the next thing I’m going to try is either put the oc back or put in the other 4 sticks of ram.
-I didn’t install any gigabyte drivers from their website. The only things I’ve installed is the AMD chipset from the AMD site and the Samsung m.2 driver
Similar situation here, I tried going down to d sticks of ram, but the problem remained, so I swapped in another 4 sticks of ram, which gave me a BSOD, then I pulled out all of the ram and put them all back in, and uninstalled the NZXT CAM, reinstalled the lateat G.SKILL Royal RGB Lighting utility, now at the time of posting this thread, my pc has been running without an issue for almost a week.
Another thing I did was changing the RGB Lighting effects of the ram sticks to Strobing, I had set it to Flash originally, but I doubt the issue had been caused by this.
So I'm not sure about how I seemed to have fixed the issue, good to hear your good news btw.
Yeah when i reinstalled windows I didn’t install CAM again or ICue which I used for my rgb lighting before.
Yeah, its a hard thing because it never really feels fixed without knowing what exactly it was.. like it feels like it could always happen again randomly. Think it could be CAM? I saw there was a firmware update that came out a week ago but they dont say what it fixes so I haven’t installed it yet.
cgorange, Google finds several 'hangs' blamed on NZXT CAM. Please search. I would suggest you remove all free SW not really needed and see if there are any problems. You can put them on one at a time later if needed. Just removing then plugging the memory back in could have cured a problem. Enjoy, John.
Yeah after I reinstalled windows I didn’t install any free software like CAM. I’ll do some searches though. Now I’m curious.
I suggest redoing your CPU cooler with MX-4 which outperforms the stuff you mentioned you used with the Kraken cooler
use a small dot (half a bb) and put the cooler back and boot up
Hmm. the 2990wx needs a lot more than a small dot. Couldn't hurt though. But, I think the main issue is the size of the kraken x52 plate and the size of the 2990wx.
cgorange, more than a dot is indeed needed. I have applied TIM to a couple of TRs and removed one. The TIM brand makes little difference. I know some have a favorite but AnandTech tested many from liquid metal to toothpaste and denture cream and all worked about the same except for the liquid metal and last two. The liquid metal was well best but dissolved AMD processor cases (please avoid). Toothpaste and denture cream were poorest but really not all that worse. The important thing is to get a good even coverage. MSI and lots of others have videos of applying TIM on TRs. I use a rather long but very thin bead down the middle of the HS and two shorter ones parallel on the sides. It would not hurt to remove the cooler and see if the entire HS is covered. I think it is also important for the cold block to cover all of all chips in the processor module - thus my choice of the EnerMax. I do not remember what cooler you have but you might consider replacing it if your TIM coverage looks good and the cold plate does not cover all of all four chips. Enjoy, John.
Yeah, I saw that AnandTech test which is why I didn't think about changing the paste. There's also a really good youtube video I forget which guy did it.. But he benchmarked a bunch of different ways to apply paste on a 2990wx. Pretty interesting. So I did the method that got the best results which was 5 dots. one in the middle and one over each of the 4 chips. Kraken x52 isn't great in my opinion because its a circle instead of a rectangle so it inherently doesn't have the same surface area.. I think it covers all the chips but I've never really looked at it. I'm going to wait for a different brand to release a full plate cooler. I like the EnerMax but I hate that it says Enermax right on the front of the cooler in RGB. (Yeah, I know most coolers do that). It just looks so bad in my opinion.
cgorange wrote:
Hmm. the 2990wx needs a lot more than a small dot. Couldn't hurt though. But, I think the main issue is the size of the kraken x52 plate and the size of the 2990wx.
lots of people are mislead and use 10-20x too much
the defects are at the micron level, not mm level
hardcoregames™, I disagree. I have seen several overheating system users reporting removing the HS to find significant HS areas with no TIM at all. Your half a BB is for AM4 type not TR4 type processors and is probably still too little. I have also seen users with TIM oozing out the sides - not good either. To each their own. Enjoy, John.
I do like how your cooler directly says it supports 500w TDP and that its full plate.. It's deff better for TR4 no doubt as is your motherboard... no fair
check this out:
xxx.youtube.com/watch?v=nWu2tcm4wL8
Ironically he uses the same type cooler I do where the plate is a circle.
He makes a good point around min 22 where he says that there may be a problem where the screw holes are in the plate because that may be too close to a die or chip and that may cause air bubbles or heat pockets.
But hey the cooler is on the amd threadripper supported list so it must be ok right?
cgorange, sorry, the YT link is bad. Would you expect AMD to put only the very best on the supported list? Why is there hundreds of memory sticks on the QVL? I would assume being on the supported list means it is adequate, for more performance get another one on the supported list and probably pay more. The world is never a very fair place. Enjoy, John.
Replace the xxx with www or just copy the YouTube part of the link
I do think that list should be clearer about which are actually made for the product. But, I get why it needs to be general.
cgorange wrote:
check this out:
www.youtube.com/watch?v=nWu2tcm4wL8
Ironically he uses the same type cooler I do where the plate is a circle.
He makes a good point around min 22 where he says that there may be a problem where the screw holes are in the plate because that may be too close to a die or chip and that may cause air bubbles or heat pockets.
But hey the cooler is on the amd threadripper supported list so it must be ok right?
All 6 of the suggested patterns are still far from the reality of the size of the surface defects to be addressed
a micron is 1/1000 of a multimeter which is below the human eye perception
a 600x microscope would clarify the situation but hardly anyone has one of them handy
The different patterns don’t matter that much which is what the benchmarks reveal. The important part is that the surface gets covered without gaps in the compound. Which is why a single half a bb dot is really questionable that it will spread enough to cover the whole surface. But, test it and upload a video and show what the coverage is.
thermal pumping with temperature changes combined with the tension will spread it far and wide
power down and let it cool, do the MX-4 as I said, put the cooler back and make sure the clips are on right. Boot up and carry on.
Check the CPU temp with HWmonitor and you will see a little goes a long way
PC hard Freezing using CAM Software – CAM CUSTOMER SUPPORT
CAM really might be the culprit, in the post I see very similar issue to ours. And last time I said my PC ran without an issue for 2 - 3 days, I seemed to have closed the CAM software. Until I restarted my PC and CAM auto started then the issue came back.
Thanks for finding that post. Very helpful. The guy in the post describes it as everything freezing. For me maybe one or two applications open would freeze but some of the applications wouldn’t like the game i was running or the internet browser i was using and I couldn’t open new applications. But this deff does look promising.
You have CAM software, what cooler do you have tylr? And did you end up putting in your 8 sticks again? Are your 8 sticks one set or did you get 2 4stick sets?
I actually got 4 sets of 2×8 ram, and I ended up putting them all in, without having any issue for a week, my cooler is a Kraken X72, I just finished building another rig for a friend, and I recommended him to use a Cooler Master cooler instead of NZXT Kraken, which I think is a much wiser choice.
that's great news. I just put my 2nd set of ram back in which isn't on the QVL list. But I took off my OC to be safe. So far so good. but its only been a half day so far.
Is there any way you could post a ryzen mater screenshot at idle and one while rendering? if you have any rendering software or run like cinebench 15 or blender benchmark for a while just to see what your temp gets up to? I'm curious if you have around the same temps as I do and youre the only other person with a Kraken here I think. I wonder how much better the 72 is then the 52 that I have too.
Is the cooler you recommended the cooler master master liquid ml360 rgb?
I actually just made a post asking about that cooler which is being moderated now.
your kraken x72 has a bigger radiator so it'll be interesting to see if it that's enough to keep the cpu cool or if you really do need a bigger plate. It'll be good co compare our temps
I haven't run any benchmarks lately, but if I remember correctly the liquid temps of my Kraken X72 has gone up to 40 something degrees Celsius, so I guess a bigger radiator could help. However, my graphics card is a Galax 2080ti HOF which is not water cooled, so maybe some of the heat is cumulated from the graphics card to the radiator.
I don't think the Kraken is good at dissipating heat from the CPU though, I have built another pc with a core i9 9900k also cooled by Kraken X72, and the CPU temps could go up to 70 degrees Celsius running cinebench, which is why I never recommend using Kraken cooler anymore.
What I like about the Kraken is its design, it looks really good from my opinion, but the RGB control is not so versatile like the masterliquid ml360 rgb - yes, the cooler I recommended was this one - which comes with an LED controller. I'll tell u the temps after my friend run some tests, he has moved his PC back home already and I can't get my hands on it now.
40 while using all the cores at 100% for an extended period of time? That sounds too good to be true. My temps are 33-40 while idle and not doing anything and they go up to 68 when all the cores are at 100% for an extended time running fine bench over and over again.
I agree on the design. It’s why I picked it too. Back when I had cam I made it so the color would change as the temp changed. So when it would go up to 68 it would turn red and while it was 33 it was white... it was pretty cool and functional. Too bad CAM is so bad. I also hate when companies put their logo in RGB in prominent places on the product. At least NZXT was more random then a coolermaster logo which is why I think its better design wise.
I’m not really interested in the temps for the i9 9900k.. I’m really only interested in the 2990wx temps so I can compare it to mine. Do you use Ryzen Master? I have example screenshots at the top of this thread.
When I said 40, I meant the liquid temp.
When using all the cores at 100% for an extended period of time (I use prime95's small FFTs test which I think is a better way of heat stress test than running cinebench over and over again), the cpu temp on the Ryzen Master would up to 67.50. And I guess that is normal?
Your temp goes up to 67.5? That’s very interesting because that’s the temp that mines goes up to. Which means that the bigger size of your radiator vs my smaller size doesn’t matter since we both have the same temps. That temp is not the greatest because the cpu will throttle and slow down at 68 degrees to keep it from overheating. Other people on this forum with different coolers only go up to 50c while using all the cores for example so it shows that our coolers are just not as good.
My situation was actually quite similar to how the guy in the post described, I could see things freezing, but I could still move my mouse, or maybe open the task manager, but I can't open new tabs in chrome nor see anything helpful in the task manager, and I cannot restart or shut down my PC from the start menu, the only way was to force shut down by holding down the power button.
This is great news. It's more and more looking like a problem with CAM
Yeah, there are a lot of posts about CAM and freezing.. take a look at this one
https://www.reddit.com/r/NZXT/comments/8jcx8o/cam_appears_to_be_causing_massive_hard_system/
I just ran into this post.. Haven't tried it yet but this looks really interesting and sounds like the instability I was having.
Why does Windows 10 suddenly slow down to a crawl, partially freezes - Microsoft Community