Problem is as follows system boots works great I can watch youtube, edit on resolve and use photoshop. As soon as I boot into a game I can only play for 15 min.
My system:
CPU 5900X
GPU 6800XT
PSU Be Quiet Dark 1000W
Ram Patriot Viper 4133( currently set to 3533 and fabric clock 1733 no DOCP just set to auto)4x8gb
Motherboard Asus Tuf X570 Wifi Plus with newest BIOS installed 4021
I have it overclocked and everything in the bios set to auto, I have enclosed pictures as well.
I have done numerous changes such as up soc and vcore voltage, disabling cstate, changed the ram to 3200 and still crash after crash. I know I'm going to have to buy another 5900 but what can I do in the mean time just to be able to play a game is there anything at all that can be done? Or should I just swap my 5800 in and pray it doesn't do the same thing? I do have a spare MOBO as well but haven't read anything about anyone really having MOBO issues. I bought this CPU brand new from another party and yes while I have the receipt I believe they would have to start the RMA process and I'm sure that's something the other party wouldn't be interested in doing.
RAM and Fabric (FCLK) should be the same value, e.g. if you use 3600MHz RAM, fclk should be 1800MHz. Also, try setting to 3200 (mem) / 1600fclk.
yea tried that as well
I had similar issue. First check if the issue goes away when you disable core precision boost (CPB) in the bios. It should be in one of the CPU menus, may take you a while to find it depending on your motherboard. This should cap the CPU top speed to around 3.5-3.6GHz top speed. Leave DOCP on so RAM is running at top speed and boot into windows and run a game and see if it is stable. If it is stable, then you should think about the BIOS. Did you recently update to version 4021? Was it okay before? you could try reflashing the BIOS again in case the last one missed any important instructions, or try flashing it back down to Version 4005 or Version 4002 as those might be more stable.
I actually flashed back to the previous to see if the issue would go away and it would not. Not really excited about turning off CPB but if that's what will keep it stable until I get a replacement than so be it.
If disabling CPB works and is stable you could try this next. Re-enable CPB and install ASUS AI Suite 3 as that comes with easy CPU overclocking in Windows and you're on an Asus mobo anyway. In AI Suite 3, go to the CPU and overclock it to 3.7GHz by dragging the bar to 3.7GHz and leave anything else, apply it and test with benchmarks, games, etc. If it is stable, move it up in increments of 100MHz until it's no longer stable and that's your max overclock without tweaks which could fry the CPU. I got mine to 4.2GHz so it gave me quite the speed boost until I manage to fix my situation, it might give you the need boost for the time you wait for RMA or fix the issue.
so disable pbo and dont't use ryzen master i'm assuming with this method
That’s correct
so lil update just loaded optimized defaults basically putting everything at auto and i couldn't even stay in windows for a min or two I have now disabled CBP PBO and C-state memory is at auto, my mother board only see's 2 profiles for my memory 4133 or 4000 so I had to select auto and set it to 3600 and fabric at 1800. After doing all this with Ryzen master open my edc is no longer at 0 its actually showing a number now don't know what that's all about. I did undervolt it 0.0500 I'll probably adjust that to 0.0400 so I can get the 1.1 volt. So in order to use the asus ai I would need to turn back on CBP. Correct?
Most people don't understand just how out of spec they are running their systems.
First, 3200MHz is the Max in-spec speed for any memory in any motherboard supporting the Vermeer CPU's.
Second, the more memory ranks one runs, the tighter that spec becomes. If you shove 4 sticks of Single rank memory, the support drops to 2933MHz. Go to 4 sticks of Dual rank (Which I imagine you have), and the support drops to 2667MHz.
These are AMD specs. The Motherboard and Memory pushers can't do anything about them. All one normally sees when a Motherboard is advertised is the super(OC) speed and the best in-spec speed of 3200MHz. But they don't tell you that spec drops off as soon as a third stick is stuck in. Some reliable manufacturers (Like ASRock and SuperMicro) will supply a table in their Motherboard manuals.
I loath rebranding by Memory stick pushers. Let's see what you fell for. Bring up CPU-Z and open up the memory tab. What is the highest speed rating that you see listed for those components? Please post, I love a laugh. It's the constant typing that bores the Heck out of me. I'm not trying to laugh at your expense, but the post would compensate me for the typing.
Hint: ECC memory never hurt anyone.
======== Recommendations =====
0. If you want, save your current profile (but I believe you will eventually be best off with the settings I'm about to give you.
1. Load Optimized Defaults.
2. Set Memory speed to 3200MHz, Don't touch XMP, or DOCP. And for Heavens sake get rid of ASUS Suite, it hasn't been reliable for a decade. Leave all timings to Auto
Keep CBP and PBO enabled. Disabling CBP would unnecessarily gimp your system. I believe we can aim higher.
3. (Give a touch more voltage to VCore and VSoc. The system will run with a range of voltage. Some people are having trouble with WHEA errors when their system goes idle and it down volts the CPU. So... let it run Normal but with just a wee bit more.)
Set VCore from Auto to Normal
Now a differential field opens up. Set the differential to add +.006V to Vcore.
Set VSoc from Auto to Normal
Now a differential field opens up. Set the differential to add +.006V to VSoc.
4. Others, like yourself have trouble when the system boosts to the very highest speeds.
What we did in step 3 would boost voltage at the lowest and highest frequencies. In your case we want to discourage the system from chasing the very highest frequencies but still boost well beyond base. And this is done by adjusting power(PPT)
Set PBO to advanced.
Set limits to Manual
Leave TDC at 95 (Amps)
Leave EDC at 140 (Amps)
Set PPT down from 143 to 120 (Watts) (you really didn't want to push 143 watts into those tiny transistors did you?)
(this is what will stop you from boosting to exorbitant speeds then crashing)
5. While you are in the vicinity. Lets get a handle on temperature. People seem to worry about that. I still hold to the old engineering construct that for every 10C one raises the temperature, the chip life is halved.
Set Thermal Throttle Limit to Manual
This should open up a new field also called "Thermal Throttle Limit" Set it to the temperature you like. I like 75 (Celsius)
The Ryzen is great at running within the limits you specify. I could even take the fan off my heatsink and the system still won't go beyond the limits. (It will just lower frequencies, Multithreading, core dispatch, and voltages)
I think you will be pleased with these settings. Of course lowering PPT and Temp will reduce some performance. However it is a LOT better than turning off CBP and just running base.
Run with these for a few weeks, before experimenting with pushing the limits on that memory.
Oh yes here is that table from ASRock. (Remember these are AMD's specs for Vermeer)
I'll give all of this a try I never seem to have temp issues when playing may get to 55-60 C but idle 34 usually. So if I do this and EDC returns to 0 again in Ryzen master is that just a glitch or something else? Since I have basically disabled my system to run at stock I'm still getting Whea errors but the system isn't shutting off.
I just noticed, that the Event Logger was warning you of Corrected errors.
It wasn't the corrected errors that brought you down, but the uncorrected errors (2 bits or more flipped in error)
The only system that I know of that corrects itself is the memory subsystem. Thus ECC.
However, I doubt that your Patriot memory is Error correcting. Which means that the errors are occurring within the internal memory controller on the Ryzen chip itself. Again the most important values for this would be VSoc voltage and PPT to hold back frequencies that your Internal memory controller can not handle.
Thank you for the post. A picture IS worth a thousand words.
Lets See... Patriot took components that are rated for 2133MHz and 1.2 V
Cherry picked...LOL and told the user to push 1.4 Volts, and then attempts to jump
Seven bins!
4000(O.C.) / 3600(O.C.) / 3333(O.C.) / 3200 / 2933 / 2667 / 2400 / 2133
I love Kingston ECC.
The Components are rated at the same speed as the sticks.
I usually can bump the speed upwards of two bins without changing voltage. So my 3000MHz get 3400MHz and my 3200MHz get 3600MHz. But the best part is that Memory errors from the Dimms themselves will show up in Event viewer. People who try to overclock without Error correction, are like people who drive race cars without brakes to save 7 pound of material.
system crashes almost immediately with the loaded configs you have suggested
If you are in PBO Advanced, and you are entering the Limits Manually.
I wouldn't leave the field blank. Specify exactly what you want.
I believe default for 5900x is TDC 95A, EDC 140A, PPT 142 Watts
But set your PPT down to 120
did that exactly and crashes
Is it crashing in BIOS?
What are your measured readings in BIOS?
Did exactly what you said and crashes if you want email me this board won’t let me reply I have to wait and it’s annoying email me at justin.d.stevenson1@gmail.com
I see you’ve been fiddling with the RAM settings as per Gwillakers recommendations. I went back and I didn’t read how it went when you went with the original approach of leaving everything at default and disabling CPB, did it boot into Windows and stay stable? I read you also undervolted the CPU, that inserting unnecessary variables to the mix. If that undervolt is still in place that might be causing your memory controller issues. Take it one step at a time, apply defaults, dont undervolt leave is auto, disable CPB, stability test, if all good slowly reapply RAM and overclock etc. You’re rushing into it, slow down
Honestly I was still getting whea errors with everything disabled and performance was horrible I’ve contacted amd I think it’s rma time
Honestly, I thought you wouldn't have trouble with the settings I gave you.
All would have kept you constrained within more conservative boundaries.
Since my settings failed you, I can see why you would have some doubt, but don't despair.
Be patient, and I believe we can tame this beast yet.
I agree with linkz0rz. His approach is sound and warranted at this point.
You must get back to base, turn off all overclocking and move back into the forest slowly.
1. Load Optimized defaults, CPB disabled, PBO disabled, XMP off, DOCP off.
2. Reduce the number of memory sticks to two. Make sure they are in different channels as specified by your Motherboard.
(You could go all the way back to one, but I believe your problem is more with voltage and frequency to the Internal memory controller, not the Dimms).
3. When you go to this simplified - Non Overclocked settings, observe if your WHEA warnings go away.
The ones I talk about are the ones that say corrected. They are not bringing your system down, but they are the first indication that timings are not quite right.
I noticed your WHEA warnings before, but I just now saw that you were getting thousands per hour.
This happened to me a couple of years ago, (I don't think it applies here, but the more one knows the better)
I was changing a Motherboard in a case, (swapping an ATX in, and taking an mATX out)
Forgot to change one of the standoff's. It was left in the wrong position. It squished some pins on the underside of the board, so that a couple were touching one another. This caused my system to run super slow, as the Memory was correcting the errors, but not bringing down the system. Straightening the pins removed the error and all was well.
Like I mentioned before, your system is not using ECC Dimms, so you won't catch the errors on the MB, but your processor still uses ECC internally.
Not warranted yet, but you eventually might consider reseating the processor. Pay careful notice if any underside pins are bent.
Sounds pike faulty RAM, do you have a different set to try? If Gwillakers is right then that RAM could be the issue. If you got a spare pair you could try those out see if it makes any difference
Real quick I have a msi x570 edge in my other rig would that be a better option over the asus mobo?
If we are taking bets:
Bad Ram - 5% (Though I don't like his particular Ram, the Event log shows Corrected Hardware errors. Which to me indicates that the data went wrong in the Internal memory controller, L1, L2 or L3 cache, and then corrected it.
Frequencies and Voltage settings for Internal Memory controller : 60%
Bad 5900X (including bent pins) : 18%
Bad Motherboard 17%
Keeping my fingers crossed. But I don't expect it to account for those errors. Non-ECC Dimms do not store any error correction or Checking data. The Processor blindly accepts what the Dimms give it. Once it is in the processor though, a 64byte cache line will under go a calculation of 8 bytes of Check info before the cache line is stored. When the cache line is retrieved, the 64 bytes undergo the same calculation, and the 8 byte result had better match the 8 byte check info when stored. Otherwise, you have an ECC error. If it is only 1 bit flipped it corrects it and logs it in one of the Event viewer logs. If two or more bits flipped, the system is coming down
Oh by the way, You need to wait 600 seconds to post again. However I believe you can edit an old post immediately.
Well I can switch out ram from other rig and give it a go, other motherboard runs flawlessly with the 5800x the asus from the get go was a headache I had to undervolt it just to get it to boot. As for bent pins I’ll check that as well. Just had high hopes for the asus board since it was rated a lil higher than the msi
Asus mobos are generally really good, it would very likely not be the mobo but you never know. Try out the MSI just to rule out the motherboard
I’d be surprised if it was I’ll keep you all posted I appreciate all of your help
Update no bent pins reseated cpu and swapped ram now have 2x8 gskill trident zero will boot up and let you know where I’m at after that
You don't have to wait for a crash. One does not have to pay attention to all Event Viewer warnings.
See if you are still getting the warnings of corrected errors.
While not all Event viewer warnings, merit one's attention, Machine check corrected errors do.
One should not live with those warnings. They must be corrected or RMA.
ok mem swapped and docp profile actually applied and seems to be stable, still have a boat load of whea error's so where shall we start
WHEA errors which are machine check - corrected are NOT good. They can Not be ignored.
They must be eliminated, it is just a matter of time before you crash again.
If you are running DOCP turn it off. It is an overclock. First we must get this running inside specifications, without any machine checks. CPB should be disabled as well as PBO. We are not trying for performance, just stability without those machine-check errors.
By the way, you don't have to wait 10 minutes to edit an old post. Easier to have a running conversation
Take a look at your event viewer. You should see a separation of entries by time. Take a pic of the entries just before the new set of entries, (where it complains you didn't shut down normally)
How long was it before you crashed? Were you in Windows or BIOS. Was the system stressed or somewhat idle?
so it crashed and i now have everything disabled and im at stock 3.7ghz and I have no WHEA errors at stock
How long did you run before the Crash? Were You in Windows? Do you ever crash in BIOS? (they use different voltages, frequencies, and timings when in BIOS)
When you crash, is the system somewhat idle or more busy?
Just a nit, I don't think it will matter, but your memory clock says 1800MHz.
We are having memory related problems, so this must be set down to at least 1600MHz so we don't run the Dimms faster than 3200MHz.
Your voltages, frequencies and temps look good. All seem to be in range for a non-overclocked system.
20 seconds after im in windows it crashes wont crash in bios at all been almost 37 min no whea errors at stock about to swap the 5800x in
The positive news is that you are getting far fewer Machine-corrected errors.
Which means, slowing down the system helped.
You may think, but I still came down. However a memory error is an error, whether it is corrected or not.
Any error could, if uncorrected, bring a system down. Now there are fewer opportunities.
It all comes down to waiting. (Hope the simplicity of this doesn't offend you)
Three types of Waiting...
1. CPU can poll... Are you done? Are you done?.... etc. Easy to program, but wastes CPU resources.
2. Wait and Interrupts. CPU gives work, goes off does other things when interrupted, saves state so may multitask
3. By Contract or Mutual Agreement. Both the CPU an Memory are very busy pieces of machinery. They for the most part can not be bothered by the other. The CPU has work for the Memory subsystem. The CPU is faster than the memory, and the CPU waits for the memory.
Imagine two people (CPU and Memory). They reside in two different rooms, they don't see one another. The rooms share a common hallway. There is a light in the hallway and a guy to flick it on and off. The CPU when he wants to put something in storage, he leaves it in the hallway. Memory needs a certain amount of time to put things away. CPU and Memory come to the agreement that the guy at the end of the hall will flick the light on 12 times a minute. They also agree that the CPU will give Memory 6 flicks of the light to store something, and after that the CPU can assume it was done
Now if the memory is an old fat guy like me, I'm gonna need every flick of that light to do my stuff. But if Usain Bolt is doing the storing, he might not need 6 whole flicks of that light. You would not be taking full advantage of Usain's speed. Heck if you come to an agreement to store things in 5 flicks, the CPU wouldn't have to wait as long and would get more accomplished running at the same frequency. Heck you might even tell the guy at the end of the hall to flick that switch faster. The only problem is, is when you don't give the memory subsystem enough time. Then you got garbage and crashes.
The AMD spec is 3200MHz. That's why I'm such a nudge when people have memory problems. They want to believe the package the memory came in, but ignore the Specs of the processor.
but where do we go from here i havent received any errors at stock and if this cpu out of the box cant even run at its boosted clock speeds and I have to handicap everything to just be stable then I wasted money lol I have the 5600, 5800, and now this and never had issues with those processors just the 5900x I'm down for some tweaking but this message board going back and fourth takes forever I can be reached on twitter @steveoeditz and we can dm each other before I swap the cpu out for the 5800 and just rma this 5900
Don't assume that all these things which are disabled will stay disabled.
Once we get a handle on what exactly is bringing you down, might be an easy painless correction.
Being stable in BIOS is a good thing. It proves that there are settings that work.
In my original post, I was trying to limit running at the highest frequencies by lowering PPT.
I'm curious, what happens if you edit your windows power plan and change the maximum processor power state to say 80%
See if that keeps you in windows longer.
so I have BITSUM power plan that was installed when I got my PC optimized however the whea errors came long before the optimization so are you saying I need to change it to balanced?
Well doesn't minimum processor state of 100% strike you as odd?
I would recommend a minimum of 15%
And for now set the Maximum to 85% (we will turn that back later)