- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
WHEA-Logger 18/LiveKernelEvent 124
Hi everyone, I've been having a lot of crashes periodically for the last year and was tolerant with them until this last week. I used to crash occasionally in a certain game (Apex) and would just avoid it, but now I'm crashing while streaming as well as when I'm on a Zoom call for classes to the point that it's happening once or twice a day...
I've tried to follow other threads about the same error messages but not really finding any solutions that work. Crashes occur at random intervals, I can be idling and it will crash, playing a game and crash some time during that play session, or just be on a Zoom call and crash without warning. Any help is appreciated.
List of computer specs:
- CPU: AMD Ryzen 7 5800X 3.8 GHz 8-Core Processor
- CPU Cooler: be quiet! Dark Rock Pro 4 50.5 CFM CPU Cooler
- Motherboard: Gigabyte X570 AORUS ELITE WIFI ATX AM4
- Memory: G.Skill Ripjaws V 32 GB (2 x 16 GB) DDR4-3600 CL16
- Storage: Intel 665p 1 TB M.2-2280 NVME Solid State Drive (What OS is loaded on, I have another ssd as well as 3 other hard drives connected as well)
- GPU: Asus GeForce RTX 3080 10 GB TUF GAMING
- Power Supply: Corsair RMx (2018) 850 W 80+ Gold Certified Fully Modular ATX
Link to dump files: https://www.dropbox.com/s/z9djtq7yqaqf578/Dump%20Files.zip?dl=0
Things that I've tried to stop the computer from crashing:
- Updated Windows/Updated any outdated drivers in Device Manager.
- Updated AMD Chipset drivers.
- Updated NVIDIA Graphics Drivers.
- Turned off Windows 10 Fast Startup Feature.
- Reseated all hardware and dusted out computer case.
- Updated BIOS to F35.
- Updated x570 Drivers.
- Disabled POS.
- Disabled XMP.
Off the top of my head, that's what I remember testing so far. Largest issue being that this crash doesn't happen on a consistent basis so my days basically go, crash happens, I adjust some settings, wait a day, next day, crash happens, and the cycle continues...
Once again, any help is appreciated and thank you in advance.
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my opinion Ryzen_Type_r gave you a better recommendation than using a static voltage and frequency.
I didn't get to view your dumps, but will assume you are encountering the same WHEA errors in Event viewer that others are.
However, first let me address something that I often find mentioned in these posts.
It seems that people are particularly worried that their systems crash even while idle.
Rhetorical question: Do you know when the processor will select the highest voltage and the maximum frequency?
Answer: It is right when it is coming out of idle, and it is lightly loaded. It is precisely when it only has 1 or two things to do, that it will ratchet up the speeds and run the cores and memory controller faster than it was built for.
I'm no electrician, but in life I've noticed that electric cars run faster and lights burn brighter when you up the voltage.
I've also noticed that the more Christmas lights you put on a tree the more current it consumes.
So using the basic equation : A X V = W
We can rewrite it to be : A = W/V
So suppose I have a Program with 5 Threads and each thread will consume 10 Amps.
Further suppose that my system can only handle 100 Watts(PPT) before burning up.
So the question becomes : How high can the system raise Voltage, and still supply the necessary current to the cores?
or 50 Amps = 100 Watts / ?V
If V becomes greater than 2, then the right side of the equation will not match 50 Amps needed on the left.
So running 5 threads requiring 50 amps, with a power limit of 100W means you can't raise voltage more than 2 volts.
Now lets take a look at what happens when the load on the system doubles. You now need to run 10 threads each requiring 10 amps for a total workload of 100Amps. Your system still can only handle a power loading of 100W.
Now how much voltage can your system apply and still supply the 100Amps?
100Amps = 100W/ ?V
Answer: In order to handle the increased workload the system must drop the voltage to no higher than 1 V.
Understand that the voltage will later determine what frequency the cores will run at.
So we have shown, that the higher the load on the system, the more current(A) will be required, and the corresponding Voltage(V) will have to be lowered if the Power(PPT(W)) remains constant.
You can run more things concurrently, but you have to run them at a lower voltage and thus more slowly (lower frequency)
Warning: There are overclockers out there that would have you just raise PPT watts. However, they are totally ignoring TDP.
And I see the hands going up in the back of the class. Oh Professor, Intel and AMD say you can run their processors with a higher wattage than TDP. True but that is only temporary. You have to understand, that TDP represents the amount of Heat that Thermodynamically can pass through the materials of the Processor itself. (Ceramic Die, TIM/Solder, Metal Heat Spreader). Understand that the Heat is generated in the Die(s), it then has to flow through the Thermal Interface Material, and get into the Heat Spreader. It is only after that it gets into the Heat spreader, that it can pass through your TIM, and enter Your Heatsink and then get carried away by Air or Water currents. There is nothing that you can do external to the processor that can improve on its TDP. You can not extract Heat that has not yet made it's way through the internal materials.
The TDP for the 5900X and 5950X is 105 Watts. That is an internal bottleneck, like it or not. Your great cooling solution, doesn't matter here.
There was quite a ruckus a number of years ago, whether Intel was using a cheap TIM or solder for it's internal materials.
===========
One more thing, run your memory no faster than 3200MHz. That is the maximum speed of the Internal Memory controller.
People tend to ignore memory speed. They feel that just because they plunked down a chunk of change for their sticks, they should be allowed to ignore the specifications of the processor. They fail to understand, that the memory subsystem needs time to complete it's loads and stores. If not given enough time, memory corruption occurs. The vast majority of these crashes are due to the Memory subsystem not having enough time. It is a shame that more people don't use Error correcting memory. If they were to utilize it more, they would see the memory errors more quickly, often before the system was brought down. They want to run fast and care less about being reliable. That's a shame because with Error correcting memory, one can see exactly how far one can push his/her sticks. They don't have to guess like the people using Non-ECC.
====
Back to the topic:
Running a displaced Voltage on the VCore and VSoc will allow you to run cooler, as you won't have to pump the high voltage through the chip all the time. (I personally use a VCore and VSoc differential of +0.006 to +0.024V)
As you have explained, you have a system that stays up for the better part of a day. You are very close to being stable, therefore you should only need the slightest bump in the right direction.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Have temporary fix, by going to BIOS and changing the override CPU Voltage to 1.25v. (3.8GHZ), I also disabled PBO.
I read in another thread, is safe manual clock to 4.4ghz at 1.35 volts. I have not tried it
Computer Specs :
AMD RYZEN 7 5800X 3.8 GHZ
ASROCK B550 PHANTOM GAMING
32GB DDR4 3200
GEFORCE 3080TI 12GB
2TB SSD M.2
850 WATT
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Update 1: I have override the CPU voltage to 1.25v and have experienced stability throughout the whole day while playing and using apps that cause the blue screen. I'll update in a few days if the stability persists. I have not tried upping it to 1.35v but will do if 1.25v stays stable.
I have also not tried what ryzen_type_r has suggested but will if I get blue screens. Thank you guys for your suggestions!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my opinion Ryzen_Type_r gave you a better recommendation than using a static voltage and frequency.
I didn't get to view your dumps, but will assume you are encountering the same WHEA errors in Event viewer that others are.
However, first let me address something that I often find mentioned in these posts.
It seems that people are particularly worried that their systems crash even while idle.
Rhetorical question: Do you know when the processor will select the highest voltage and the maximum frequency?
Answer: It is right when it is coming out of idle, and it is lightly loaded. It is precisely when it only has 1 or two things to do, that it will ratchet up the speeds and run the cores and memory controller faster than it was built for.
I'm no electrician, but in life I've noticed that electric cars run faster and lights burn brighter when you up the voltage.
I've also noticed that the more Christmas lights you put on a tree the more current it consumes.
So using the basic equation : A X V = W
We can rewrite it to be : A = W/V
So suppose I have a Program with 5 Threads and each thread will consume 10 Amps.
Further suppose that my system can only handle 100 Watts(PPT) before burning up.
So the question becomes : How high can the system raise Voltage, and still supply the necessary current to the cores?
or 50 Amps = 100 Watts / ?V
If V becomes greater than 2, then the right side of the equation will not match 50 Amps needed on the left.
So running 5 threads requiring 50 amps, with a power limit of 100W means you can't raise voltage more than 2 volts.
Now lets take a look at what happens when the load on the system doubles. You now need to run 10 threads each requiring 10 amps for a total workload of 100Amps. Your system still can only handle a power loading of 100W.
Now how much voltage can your system apply and still supply the 100Amps?
100Amps = 100W/ ?V
Answer: In order to handle the increased workload the system must drop the voltage to no higher than 1 V.
Understand that the voltage will later determine what frequency the cores will run at.
So we have shown, that the higher the load on the system, the more current(A) will be required, and the corresponding Voltage(V) will have to be lowered if the Power(PPT(W)) remains constant.
You can run more things concurrently, but you have to run them at a lower voltage and thus more slowly (lower frequency)
Warning: There are overclockers out there that would have you just raise PPT watts. However, they are totally ignoring TDP.
And I see the hands going up in the back of the class. Oh Professor, Intel and AMD say you can run their processors with a higher wattage than TDP. True but that is only temporary. You have to understand, that TDP represents the amount of Heat that Thermodynamically can pass through the materials of the Processor itself. (Ceramic Die, TIM/Solder, Metal Heat Spreader). Understand that the Heat is generated in the Die(s), it then has to flow through the Thermal Interface Material, and get into the Heat Spreader. It is only after that it gets into the Heat spreader, that it can pass through your TIM, and enter Your Heatsink and then get carried away by Air or Water currents. There is nothing that you can do external to the processor that can improve on its TDP. You can not extract Heat that has not yet made it's way through the internal materials.
The TDP for the 5900X and 5950X is 105 Watts. That is an internal bottleneck, like it or not. Your great cooling solution, doesn't matter here.
There was quite a ruckus a number of years ago, whether Intel was using a cheap TIM or solder for it's internal materials.
===========
One more thing, run your memory no faster than 3200MHz. That is the maximum speed of the Internal Memory controller.
People tend to ignore memory speed. They feel that just because they plunked down a chunk of change for their sticks, they should be allowed to ignore the specifications of the processor. They fail to understand, that the memory subsystem needs time to complete it's loads and stores. If not given enough time, memory corruption occurs. The vast majority of these crashes are due to the Memory subsystem not having enough time. It is a shame that more people don't use Error correcting memory. If they were to utilize it more, they would see the memory errors more quickly, often before the system was brought down. They want to run fast and care less about being reliable. That's a shame because with Error correcting memory, one can see exactly how far one can push his/her sticks. They don't have to guess like the people using Non-ECC.
====
Back to the topic:
Running a displaced Voltage on the VCore and VSoc will allow you to run cooler, as you won't have to pump the high voltage through the chip all the time. (I personally use a VCore and VSoc differential of +0.006 to +0.024V)
As you have explained, you have a system that stays up for the better part of a day. You are very close to being stable, therefore you should only need the slightest bump in the right direction.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Update 2: Since changing voltage settings, I have found no issues while running the applications or games that were causing the crashes. There were at least 10 different moments that should have caused a crash if I was running at stock voltages so so far, all is good. Changes made as suggested by walton00, ryzen_type_r, and Gwillakers: upped the CPU Voltage to 1.28V and upped the SOC Voltage to 1.1V and still running stable.
Thank you all for your help, your guidance has really helped me through this problem that had been plaguing me for the better half of the year.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oh where to begin?
"The TDP for the 5900X and 5950X is 105 Watts. That is an internal bottleneck, like it or not. Your great cooling solution, doesn't matter here."
TDP isn't a bottleneck as it isn't actually even a parameter. Instead TDP is simply a classification of default PPT (Package Power Tracking), TDC (Thermal Design Current), and EDC (Electric Design Current). A "105W" TDP processor has a PPT/TDC/EDC of 142W/95A/140A. That's it. An 65W TDP processor has a PPT/TDC/EDC of 88W/60A/90A.
PPT is the maximum allowed wattage (work) that the system can do. TDC is the maximum sustained amperage supplied by the VRMs on the motherboard and EDC is the maximum burst amperage. When the processor boosts, it is limited by a few things. First, is Fmax. That is the maximum boost clock the processor will try to reach. Temperature is another limiter as the system will have a max temp. If the system gets to hot, voltages and amperage will be dropped to keep the system within a safe temperature. If the processor is cool enough it will boost up the clock speed and the voltage until it hits either PPT, or TDC. Once it hits the maximum work or sustain amperage allowed, it will go no further, even if there is plenty of thermal headroom.
That is where PBO comes in. When you enable PBO by default, PPT/TDC/EDC will be set to the limits of your motherboards VRMS. Usually those numbers are ridiculous and effectively removes them as limiters as temperature boundaries will be reached before those limits regardless of cooling solution. If you don't want to run outside the processors stated TDP, then turn precision boost overdrive off. It will now only boost within the TDP constraints.
"So we have shown, that the higher the load on the system, the more current(A) will be required, and the corresponding Voltage(V) will have to be lowered if the Power(PPT(W)) remains constant."
Not sure what point you are trying to make here? That is in fact what happens with Ryzen processors. If you run Cinebench single threaded vs Multithread and monitor in Ryzen master you will see that multithread hits a higher PPT/TDC than single, and higher temps but lower voltage. The processors will run at higher voltages only when the work needing to be done and thus the amperage is low. By regulating the voltage manually, you are also reducing what the processor is capable of in low load settings, which is just lost performance. As an experiment, you can manually set PPT/TDC/EDC in UEFI under Precision Boost Overdrive and set them to 142/95/140, and also try 88/60/90. You are now effectively running your processor as a 105W TDP and a 65W TDP. Run Cinebench R23 single and multi in both settings an watch Ryzen master. What you will likely see is that in single threaded, the settings had no effect on the voltages used. The work begin done and amperage needed to do it are lower than even the 65W TDP limit. When you run multi, you will likely see substantially lower voltages and temps, as the PPT and TDC now become limiting.
But you will also notice, that the processor won't just boost to infinity in single core loads. Even if you aren't hitting Fmax, max temp, PPT or TDC, the processor stops. Why is that? That is due to FIT. FIT is a silicon fitness monitoring tool built in to every Ryzen processor which is the maximum safe voltage for processor. The safe voltage varies with the amperage applied, as higher voltages are safe in lower work scenarios. Manually setting voltages for the CPU actually disables FIT as well as all PPT/TDC and EDC limits, so that isn't preferable. You could have a situation where the processor is in low work mode and could safely boost to 1.45V. With a manual setting of 1.35V you have now reduced performance for effectively no reason. On the flip side, under a high work load, the processor may have scaled back to 1.3V due to FIT. It will now also run at 1.35V as long as the temp is limiting.
In summary, there is no reason to limit voltages to make things run "cooler". The processor will get hottest in multithreaded workloads when voltage is already lower. By reducing voltage, you also limit performance in lightly threaded scenarios. My recommendation, is to set the PPT/TDC/EDC to the TDP of your processor and set RAM to default in UEFI and then run Cinebench Multithread and watch your temps and voltages in Ryzen Master. If you are happy with those you can leave it as is. Ryzen Master will also show you what is currently bottlenecking your system. You can ignore EDC as a limit as it is just the maximum boost amperage. It must always be equal to or greater than TDC. I typically keep it within 20A or so. If you have headroom you can incrementally raise PPT/TDC until the temps exceed what you are comfortable with. For example, I run my system at 215W/140A/160A with my 5950X. So yes, I am well past TDP here, and my EKWB water block absolutely did help. The TDC will hit 100% at that setting, and is thus my limiter. I am at just over 70C in Cinebench Multi and 1.3V. If I raise TDC more, the processor temp rises rapidly without much performance gain. The reason for this approach is that some instability may be due to just turning PBO on. Remember, if you just turn on PBO without setting limits yourself, then the motherboard limits for PPT/TDC/EDC will be used. Meaning the system is always limited by temp and always trying to boost with huge amperages. When the system comes out of idle and is "cool" the system will apply huge TDC amperage and quickly hit max temp and then have to aggressively dial back leading to instability, so the incremental method is preferable. Once you have a good temp under load (`~70C for me) let the system run a couple days and make sure it is stable without WHEA errors. Now you can start adding RAM overclocks back in and then continue to monitor stability.
My RAM is rated for 3200 CL14 but I run it at 3600 CL16 without issues. I did not use D.O.C.P but instead set all sub timings manually. Memtest passes, so we are good to go.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try bumping up the SOC voltage .05 or .1V.