If you tested it already with memtest and ram passed the full test, then low chances are that issue is with the ram.
Ram should be able to run ok at 2133Mhz even more if 2 slots only without issue.
If you are crashing at 2133Mhz then it certainly not due ram being to fast. It must be something else.
I'm running 2 x 16 Corsair Vengeance LPX 3200 C16 with no issues (5800X on Asus B550-F). Only glitch was that when DOCP was first enabled, the mobo set tRC to 74, even though the SPD has it programmed at 58. So I manually keyed tRC to 58 and system has been rock solid (no overclocking).
It could be that your memory is technically okay, but for whatever reason the mobo is not reading/setting all the parameters (there are a whole crapload of them) correctly. And incorrectly programmed parameters in the SPD is also not unheard of.
So, maybe worth trying a completely different set of memory. Also, it's possible there's a problem with your mobo.
You mentioned in an earlier post that a local store tested a previous CPU - can you get them to test your current CPU and memory?
@ryzen_type_r I have noticed on the easy setup DOCP doesn't enable correctly, but on the advanced menu all the settings were correct last I double checked, which was a long time ago. Seems about right though, I will go over these at some point, next time it crashes.
I'm still looking into maybe trying different memory and maybe even different PSU, although they've mostly been ruled as ok but not double verified.
In the end I'll probably take my whole PC to the store next time, as it'll be easier to reproduce.
Problem is it's very hard to reproduce, you just need to wait.
My first CPU, once I noticed the issue (took me 2 months to bother to pay attention to it) I was able to leave my PC on idle with Chrome in the background and it would fail after 24hrs. After a BIOS update it would error within 7 days, but once it started failing it would sometimes fail several more times before going stable again.
My second CPU was perfect for 2 months, then once it started failing it rapidly degraded, until Windows wouldn't even boot. This was definitely a CPU issue, it was the same in different systems, I can't say the cause though.
So far it's only happened once with the latest CPU, I've made a few tweaks and I'll wait to see if it happens again, then I'll make a few more tests.
@authorized to ill
I went into "all that" because not everyone is as computer savvy as yourself.
However, since we are dealing with all types here, I will make it more simple for you.
You should be able to translate this into the necessary steps.
1. Boost voltage to both the cores and the Internal memory controller by the smallest possible increment
2. Lower power to prevent the boosting to the highest frequencies.
3. Cap temperature to what ever you are comfortable with.
Oh and by the way, you are not running at stock settings. You are running at the default settings supplied by the motherboard manufacturers. If one wants to run in-spec, they would turn off CBP, PBO, XMP and DOCP.
There are numerous types of people out there: Those that need help, Those that can offer help, and those that want to complain and snicker at those attempting to make things better.
Yeah, but to be fair, that's only based on your experience.
@Cmdr-ZiNhas *done* BIOS updates up the wazoo, and it still didn't stop the errors coming.
I agree that it's *likely* that there is definitely a deepset issue with the current generation (and previous?) of Ryzen CPUs and how the BIOS & Driver (proven by linux users replicating their Windows issues without fault) interacts with them. However it's not going to help to sound like the authority on something when not everything is known. You potentially lead people down blind alleys.
Apologies if any of this has already been pointed out to you, mate, and I'm not saying it harshly, but thought it should be mentioned for new folks coming to this thread ... likely linked to it from others.
I am pretty convinced by all of the huge confusing threads like this one, and my own experience, that this is not a bad CPU problem. If you are on your 3rd CPU it only reinforces that fact. Its not the issue. The issue for almost all of us have been a combination of specific hardware and firmware versions.
Simple BIOS firmware update to my motherboard fixed it. No more errors in event viewer no more random reboots. Is super solid. Actually loving this chip at this point. I do understand it's a real b*tch of an issue to troubleshoot. I was pulling my hair out. But I would honestly bet money that its almost always going to be a firmware issue, somewhere. We often think we are all updated but are not. Go through all your hardware thoroughly and check for updates from the manufacturers and ensure you do your motherboard.
Mobo/BIOS update fixed it for me and indeed many of the suggestions in this and similar threads (adjusting XMP CBP PBO etc) I noticed those settings were slightly changed in the new firmware update for my gigabyte mobo. Which explains why a lot of people were able to fix it with a minor voltage differential or disabling some of the above features
At this point I would recommend that everyone (whether the issue is resolved or not) use the AMD Technical Support contact form, below and raise an INCIDENT with them.
The word 'incident' is important. When talking about these concerns with a helpdesk, you must use that word, in short, it is how they work. Especially don't use 'problem', and try to steer clear of other words (like issue), this will indicate that it isn't a major thing for you.
If we copy/paste the following, then it will be more likely to register as a major issue.
Ryzen CPU Cutting Out - Incident - WHEA?
https://community.amd.com/t5/processors/ryzen-5900x-system-constantly-crashing-restarting-whea-logger-id/td-p/423321 The above link is where I have been discussing this incident (and recurrences) with others who have experienced this. You may not have seen it as it is marked as resolved, but for most it is not. INSERT_DESCRIPTION
So, that's the base link to this thread, up front and center, and showing that you're aware that they might not have seen it.
Try to mention the WHEA errors if you've experienced them (or seen them), and include as much/many log references as you can. Also include any/all troubleshooting that you and/or your supplier have done.
If you have resolved the incident mention this at the end, and say that they can close the ticket once you have a reference number. Because if @Cmdr-ZiN's experience is anything to go by, this will recur (repeat incidents are called 'recurrences' in helpdesk world) then you will need to either open a recurrence or re-open the ticket.
I got my whea 18 bsods to stop by changing my memory from 1t with gear down enabled to 2t with gear down disabled. I have 4 dimms (16x4 gb) and apparently getting full slots to run at 1t with gear down is tricky. I actually wrote g.skill about it, I wanted to know if the kit was suppose to be at 1t or 2t. Their response was with 4 dimms and this memory size that its always 2t. Sometimes it can be stable at 1t, if so then go for it.
The difference is night and day. When I had 1t with gear down enabled I get whea 18s within 15mins of gaming or going idle. Sometimes I'd get lucky and it would last for a hour. I noticed that when I increased the soc, vddg and vddp voltage that I was able to game or go idle for quite sometime before it rebooted.
Now everything is on auto, xmp enabled but 2t instead of 1t. Now there are no errors.
@crayraven I'm going to try this with XMP enabled.
G.Skill basically told me that since the ram isn't rated for my mb that I should run it slower. When I told them it worked fine with a 3700x they said to RMA the ram.
I'm very much doubtful it's the ram, as I've run multiple ram tests that all turned out fine.
I do have 4x8gb sticks, and I have been running them 1t with gear down enabled as recommended by that ram timing calculation software that's way overdue to be retired.
Going to test it out tomorrow! Thanks for this.
I had the same issue. Memtest would come back clear so I really didn't think it was a memory issue. I only started to think it could be one after putting together the facts. Which was that whea 18s became infrequent when memory related voltages were increase. If I'm right and I hope I am, the reason why the increase voltage was needed is because the memory wasnt stable at 1t with gear down enabled. It would make sense.
Stock, default, whatever. Whatever you want to call out of the box settings. Also, you making a suggestion in an attempt to help isn't the thing I took issue with. It's the notion that one would have to make such adjustments to get their system up and running "out of the box". Adjusting voltages of any sort isn't something that should be expected of your average consumer. Since you seem to recognize that there are "numerous types of people" out there, I shouldn't have to point that out. Anyway, to be clear my frustration wasn't aimed towards you or your attempt to help. I apologize if it came across that way.